Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Losing cache disk(s) on regular occassion.

Featured Replies

I'm looking for some secondary support for an issue I've tried to solve for a few weeks as it takes some time to occur.

 

Some history:
The server was running a cache setup with 2 SSD's of 240GB on BTRFS RAID1.
About 2 months ago I upgraded my cache to two new disks to have more cache space: 2x 1TB (CT1000MX500SSD1).
As far as I'm aware I properly swapped these one by one to preserve the mirrored cached data.


It seems I'm losing a disk after about 1 or 1.5 week with the following errors:

Jan 24 02:07:53 Mountain kernel: ata1.00: exception Emask 0x0 SAct 0xffffffff SErr 0x0 action 0x6 frozen
Jan 24 02:07:53 Mountain kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Jan 24 02:07:53 Mountain kernel: ata1.00: cmd 61/20:00:30:b4:3a/00:00:08:00:00/40 tag 0 ncq dma 16384 out
Jan 24 02:07:53 Mountain kernel:         res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)

[...]

Jan 24 02:07:53 Mountain kernel: ata1.00: status: { DRDY }
Jan 24 02:07:53 Mountain kernel: ata1: hard resetting link
Jan 24 02:07:59 Mountain kernel: ata1: found unknown device (class 0)
Jan 24 02:08:03 Mountain kernel: ata1: softreset failed (device not ready)
Jan 24 02:08:03 Mountain kernel: ata1: hard resetting link
Jan 24 02:08:09 Mountain kernel: ata1: found unknown device (class 0)
Jan 24 02:08:13 Mountain kernel: ata1: softreset failed (device not ready)
Jan 24 02:08:13 Mountain kernel: ata1: hard resetting link
Jan 24 02:08:19 Mountain kernel: ata1: found unknown device (class 0)
Jan 24 02:08:24 Mountain kernel: ata1: link is slow to respond, please be patient (ready=0)
Jan 24 02:08:48 Mountain kernel: ata1: softreset failed (device not ready)
Jan 24 02:08:48 Mountain kernel: ata1: limiting SATA link speed to 3.0 Gbps
Jan 24 02:08:48 Mountain kernel: ata1: hard resetting link
Jan 24 02:08:53 Mountain kernel: ata1: found unknown device (class 0)
Jan 24 02:08:54 Mountain kernel: ata1: softreset failed (device not ready)
Jan 24 02:08:54 Mountain kernel: ata1: reset failed, giving up
Jan 24 02:08:54 Mountain kernel: ata1.00: disable device
Jan 24 02:08:54 Mountain kernel: ata1: EH complete
Jan 24 02:08:54 Mountain kernel: sd 2:0:0:0: [sdc] tag#12 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s
Jan 24 02:08:54 Mountain kernel: sd 2:0:0:0: [sdc] tag#20 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=DRIVER_OK cmd_age=0s
Jan 24 02:08:54 Mountain kernel: sd 2:0:0:0: [sdc] tag#12 CDB: opcode=0x2a 2a 00 08 38 76 98 00 00 40 00
Jan 24 02:08:54 Mountain kernel: sd 2:0:0:0: [sdc] tag#20 CDB: opcode=0x28 28 00 0b 13 24 e0 00 00 20 00


Initial the logs seemed unclear on which exact disk it was so I tried the following in sequence.
Configuration 2x 1TB (CT1000MX500SSD1).
- Switched SATA ports on motherboard to different ports.

- Switched SATA Cables to new cables.
Marked specific disk which causing the issue:

- Switched power SATA between reported error disk

- Switched out specific SATA Cable of the reported error disk and switched a port.

-----------------------------------------------------------------------------------------------------------------------------------------------------
Believed single disk failure removed error disk and changed hardware configuration
Configuration:
2x 1TB (CT1000MX500SSD1 and Samsung_SSD_870_EVO_1TB)
- Error occurred again on this time the other: CT1000MX500SSD1 (SDC)


So i'm out most logical ideas, should I also swap out this CT1000MX or I'd reckon its something else.

Now I'm thinking the following:
I had two bad disks, but seems very unlikely?
Motherboard is having issues, or specifically with this type of disk?

A software misconfiguration that can cause this disk access issue?

I'm hoping someone can help or at least maybe have a few other options to attempt.

Attached is the diagnostics in case any more specific information is needed. (syslog1, probably most useful).

mountain-diagnostics-20230124-1848.zip

  • Community Expert

Looks more like a power/cable problem, but if it only happens with the Crucial and not with the Samsung the board/controller might not like them.

  • Author

Just to update this item, in case some one in the future would stumble across this.
At the moment I replaced the Crucial disk for a different branded (WD blue SSD 1TB).
Based on the supply remark from JorgeB (Thanks, was a good thing to change as well) I also switched it to another supply rail in the supply.

Now the waiting game starts for another 1 to 1.5 weeks to see if it shows up again.

And as I'm mostly wanting to ensure an operational server instead of bug-hunting the exact cause. I can't be sure if its a mix of the crucial with this specific configuration or if it was a potential bad supply rail or connector of the Sata if it is resolved now.

This means that in the even I'd probably don't reply here within another 2-3 weeks, the issue is probably resolved.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.