Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Disk Failure? Help needed

Featured Replies

Disk 4 in my array seems to have issues. It automatically dismounted at some point in the last day, and I don't know why or what's going on. Trying to remount it does not work.

 

8V4gb0Z.png

 

When I rebooted the server, it started a file system check. It prompted me to open a certain log, which kept going on and on infinitely:

 

K0927eX.png

 

MOoUjz6.png

 

Again, I do not understand what any of this means.

 

My Unraid's main log looks like this:

 

Jan 24 02:18:13 Tower kernel: md4: writeback error on inode 98313, offset 125083648, sector 674783544
Jan 24 02:18:13 Tower kernel: XFS (sdg1): Filesystem has duplicate UUID 0fb26a6d-0040-405b-92c9-4fe171b93e9b - can't mount
Jan 24 02:18:13 Tower unassigned.devices: Mount of 'sdg1' failed: 'mount: /mnt/disks/WD-WX22DB0KE762: wrong fs type, bad option, bad superblock on /dev/sdg1, missing codepage or helper program, or other error. '
Jan 24 02:18:13 Tower unassigned.devices: Partition 'WD-WX22DB0KE762' cannot be mounted.
Jan 24 02:18:33 Tower kernel: ata6.00: exception Emask 0x0 SAct 0x40 SErr 0x0 action 0x0
Jan 24 02:18:33 Tower kernel: ata6.00: irq_stat 0x40000008
Jan 24 02:18:33 Tower kernel: ata6.00: failed command: WRITE FPDMA QUEUED
Jan 24 02:18:33 Tower kernel: ata6.00: cmd 61/20:30:20:35:d2/00:00:3b:00:00/40 tag 6 ncq dma 16384 out
Jan 24 02:18:33 Tower kernel: res 41/10:00:20:35:d2/00:00:3b:00:00/40 Emask 0x481 (invalid argument) <F>
Jan 24 02:18:33 Tower kernel: ata6.00: status: { DRDY ERR }
Jan 24 02:18:33 Tower kernel: ata6.00: error: { IDNF }
Jan 24 02:18:33 Tower kernel: ata6.00: configured for UDMA/133
Jan 24 02:18:33 Tower kernel: ata6: EH complete
Jan 24 02:18:52 Tower emhttpd: cmd: /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin update community.applications.plg
Jan 24 02:18:52 Tower root: plugin: running: anonymous
Jan 24 02:18:52 Tower root: plugin: running: anonymous
Jan 24 02:18:52 Tower root: plugin: creating: /boot/config/plugins/community.applications/community.applications-2022.01.22-x86_64-1.txz - downloading from URL https://raw.githubusercontent.com/Squidly271/community.applications/master/archive/community.applications-2022.01.22-x86_64-1.txz
Jan 24 02:18:53 Tower root: plugin: checking: /boot/config/plugins/community.applications/community.applications-2022.01.22-x86_64-1.txz - MD5
Jan 24 02:18:53 Tower root: plugin: running: /boot/config/plugins/community.applications/community.applications-2022.01.22-x86_64-1.txz
Jan 24 02:18:53 Tower root: plugin: running: anonymous
Jan 24 02:18:56 Tower emhttpd: cmd: /usr/local/emhttp/plugins/dynamix.plugin.manager/scripts/plugin update unassigned.devices.plg
Jan 24 02:18:56 Tower root: plugin: running: anonymous
Jan 24 02:18:56 Tower root: plugin: creating: /boot/config/plugins/unassigned.devices/unassigned.devices-2022.01.21.tgz - downloading from URL https://github.com/dlandon/unassigned.devices/raw/master/unassigned.devices-2022.01.21.tgz
Jan 24 02:18:56 Tower nginx: 2022/01/24 02:18:56 [error] 5625#5625: *4570 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client: 192.168.1.217, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.5", referrer: "http://192.168.1.5/Main"
Jan 24 02:18:56 Tower nginx: 2022/01/24 02:18:56 [error] 5625#5625: *4086 FastCGI sent in stderr: "Primary script unknown" while reading response header from upstream, client: 10.10.20.217, server: , request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "192.168.1.5, referrer: "http://192.168.1.5/Main"
Jan 24 02:18:57 Tower root: plugin: checking: /boot/config/plugins/unassigned.devices/unassigned.devices-2022.01.21.tgz - MD5
Jan 24 02:18:57 Tower root: plugin: creating: /tmp/start_unassigned_devices - from INLINE content
Jan 24 02:18:57 Tower root: plugin: setting: /tmp/start_unassigned_devices - mode to 0770
Jan 24 02:18:57 Tower root: plugin: skipping: /boot/config/plugins/unassigned.devices/unassigned.devices.cfg already exists
Jan 24 02:18:57 Tower root: plugin: skipping: /boot/config/plugins/unassigned.devices/samba_mount.cfg already exists
Jan 24 02:18:57 Tower root: plugin: skipping: /boot/config/plugins/unassigned.devices/iso_mount.cfg already exists
Jan 24 02:18:57 Tower root: plugin: skipping: /tmp/unassigned.devices/smb-settings.conf already exists
Jan 24 02:18:57 Tower root: plugin: skipping: /tmp/unassigned.devices/config/smb-extra.conf already exists
Jan 24 02:18:57 Tower root: plugin: skipping: /tmp/unassigned.devices/add-smb-extra already exists
Jan 24 02:18:57 Tower root: plugin: setting: /tmp/unassigned.devices/add-smb-extra - mode to 0770
Jan 24 02:18:57 Tower root: plugin: skipping: /tmp/unassigned.devices/remove-smb-extra already exists
Jan 24 02:18:57 Tower root: plugin: setting: /tmp/unassigned.devices/remove-smb-extra - mode to 0770
Jan 24 02:18:57 Tower root: plugin: running: anonymous

 

I've attached the SMART reports for both Disk 4 and the Parity Drive (which also has read errors).

 

Can someone please explain to me the problems? Did my disk fail? Is there anything that I should be doing?

tower-smart-20220124-0219 (disk 4).zip tower-smart-20220124-0221 (parity).zip

tower-diagnostics-20220124-0251.zip

Edited by Stubbs

Solved by JorgeB

You should post your entire diagnostics.  SMART looks ok, and what can be inferred from the syslog snip (diagnostics helps much better) is a connection issue (poor cabling / slightly loose etc) - Reseat them.

 

Also to be aware of is the non-locking cables tend to work better on WD drives due to their design (or a locking cable that also has the internal "bump" which is actually rather rare)

  • Author
6 minutes ago, Squid said:

You should post your entire diagnostics.  SMART looks ok, and what can be inferred from the syslog snip (diagnostics helps much better) is a connection issue (poor cabling / slightly loose etc) - Reseat them.

 

Also to be aware of is the non-locking cables tend to work better on WD drives due to their design (or a locking cable that also has the internal "bump" which is actually rather rare)

I forgot about the diagnostics log, sorry. Here it is (attached).

 

This drive is also in one of the hotswap bays. I had no had this issue before and it has only just occurred abruptly.

Unraid also just automatically initiated a "read check". I don't know if that will help or not.

tower-diagnostics-20220124-0251.zip

Over the course of the drive 4's life (4599 hours), it's had 34526 errors logged.  Only the last couple appear within the drive's logs but they all appear to be connection related.  Advice stays the same - reseat the cabling to it / the bay, and also give the tray an extra little nudge  Parity I'd say the same ( but only 14000 over 22000 hours)  The other drives haven't suffered the same ills

 

While an odd device reset is ok, continual ones is going to eventually lead to the drive being disabled due to a write failure going to happen at some point.

 

It's also possible the the particular bay you're using just doesn't "like" the drive itself.  (I've got one bay that refuses to work properly with one particular drive)

 

 

  • Author
8 minutes ago, Squid said:

Over the course of the drive 4's life (4599 hours), it's had 34526 errors logged.  Only the last couple appear within the drive's logs but they all appear to be connection related.  Advice stays the same - reseat the cabling to it / the bay, and also give the tray an extra little nudge  Parity I'd say the same ( but only 14000 over 22000 hours)  The other drives haven't suffered the same ills

 

While an odd device reset is ok, continual ones is going to eventually lead to the drive being disabled due to a write failure going to happen at some point.

 

It's also possible the the particular bay you're using just doesn't "like" the drive itself.  (I've got one bay that refuses to work properly with one particular drive)

 

 

It's just weird that it was working perfectly fine for the last 6 or so months in this very bay, only to stop working now. Are you sure it's not some kind of xfs filesystem error? Because I actually did manage to re-mount the drive into my array, it's just marked with a red x saying "device disabled, contents emulated"- cue to it starting a "Read check" which will take about 26 hours to complete.

 

Also my parity drive currently has 144 read errors, disk 3 has 168 read errors and disk 4 (the broken one) has 1024 read errors.

Edited by Stubbs

Cancel the read check...  No big issue there.  Reseat all the cabling, both ends etc.  Then try rebuilding the parity drive onto itself.

  • Author
1 hour ago, Squid said:

Cancel the read check...  No big issue there.  Reseat all the cabling, both ends etc.  Then try rebuilding the parity drive onto itself.

 

I don't have the time to open up the server and adjust the cabling right now, but I did try putting the problem drive in a different bay (with a different cable). Same problem, same drive.

 

ueMeNwH.png

 

Just for the sake of it, I'm attaching the diagnostics .zip for when the problem disk is actually in the array, but not functioning.

tower-diagnostics-20220123-1728.zip

  • Author

According to people in the Unraid Discord, it's literally a failed disk that needs to be replaced.

  • Community Expert
16 hours ago, Stubbs said:

Same problem, same drive.

Once a device gets disable it needs to be rebuilt, just changing cables/slot won't fix anything, you can rebuild and see if the problem occurs again, if it does replace the disk.

  • Author
1 hour ago, JorgeB said:

Once a device gets disable it needs to be rebuilt, just changing cables/slot won't fix anything, you can rebuild and see if the problem occurs again, if it does replace the disk.

Yeah, I tried rebuilding, and at the 2% mark, it stopped, disabled the drive and started another "read check".

 

I'm just going to have to RMA and replace.

  • Community Expert
1 hour ago, Stubbs said:

I tried rebuilding, and at the 2% mark, it stopped, disabled the drive

Diags you posted didn't show a rebuild, but yeah, in that case you should replace it.

  • Author
5 minutes ago, JorgeB said:

Diags you posted didn't show a rebuild, but yeah, in that case you should replace it.

 

I can't remember if the previous diagnostics was before or after I tried the rebuild.

 

This one attached is after the failed rebuild and completed read check:

tower-diagnostics-20220124-2349.zip

Edited by Stubbs

  • Community Expert
  • Solution

It's logged more like a connection/power issue, but since it failed in a different slot it's likely a disk problem.

  • Author

I've just taken the drive out, connected it to my Windows computer with an HDD docking bay, formatted it as NTFS and it seems to be working fine.

 

At this point, I honestly have no clue what's going on. I'll try formatting it again on Unraid in yet another slot.

  • Author

Well, looks like I fixed my problem. Thanks for the help.

 

I have a hunch for what caused it, but it's too stupid for words.

  • Community Expert
7 minutes ago, Stubbs said:

but it's too stupid for words.

Now you made me curious :)

 

52 minutes ago, Stubbs said:

Well, looks like I fixed my problem. Thanks for the help.

 

I have a hunch for what caused it, but it's too stupid for words.

Spill!

 

Novel ways of causing intermittent issues are always interesting, especially non-intuitive stuff.

  • Author
59 minutes ago, JorgeB said:

Now you made me curious :)

 

 

12 minutes ago, JonathanM said:

Spill!

 

Novel ways of causing intermittent issues are always interesting, especially non-intuitive stuff.

Alright, I'm not 100% sure this is what caused it, but when removing the tray, I noticed the front two screws holding the hard drive in had slightly thicker heads than the back two. It was still connected to the SATA and power connecter in the drive bay, but it probably wasn't the most stable connection.

I replaced those two thicker screws with the correct thinner-head ones, and all of a sudden it works fine.

Very plausible. The SATA slip fit connection is precise to fractions of a mm. Plastic and metal drive tray assemblies, not so much.

 

I make it a habit to final tighten the screws in a drive tray allowing the drive to free hang against the partially tightened screws, holding the tray only. That way all the tolerances should stack up to put the drive to the absolute rear of the tray.

 

SATA / SAS connectors are an electrical nightmare, they cause WAY more drive errors than actual disk failure.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.