(6.11.5) 2 SSD Cache (btrfs raid) dropping one ssd

x3n0n · May 10, 2023

Hi there folks,

i'm running unraid in a terramaster 2-bay nas enclosure and its been fun!

Sadly my new config is running for some 2 months now and one ssd is dropping from the redundant cache already. I don't know if it is the terramaster enclosure or if its a faulty ssd. I don't want to switch the ssds in the bay. If it is a connection issue, i have write errors on both ssds.

I'm running mostly from cache, for main storage i have a 32gb usb flash drive.

I attached the diagnostics. Can somebody point me in the right direction?

wkr

diagnostics-20230510-1915.zip

x3n0n · May 10, 2023

Around line 1573 in syslog it says

May 10 19:01:59 Hafen kernel: ata2.00: exception Emask 0x0 SAct 0x100000 SErr 0x0 action 0x6 frozen
May 10 19:01:59 Hafen kernel: ata2.00: failed command: READ FPDMA QUEUED
May 10 19:01:59 Hafen kernel: ata2: hard resetting link
May 10 19:02:04 Hafen kernel: ata2: link is slow to respond, please be patient (ready=0)
May 10 19:02:09 Hafen kernel: ata2: COMRESET failed (errno=-16)
May 10 19:02:59 Hafen kernel: ata2: reset failed, giving up
May 10 19:02:59 Hafen kernel: ata2.00: disable device

Sounds like a ssd thats getting dropped, but i don't know why.

JorgeB · May 10, 2023

This is usually a power/connection problem, is the SSD connected with cables or direct?

x3n0n · May 10, 2023

It is connected directly with the nas enclosure. ive made sure it sits securely in the slot.

JorgeB · May 10, 2023

Swap slots with a different device if possible, then if it happens again see if the problems follows the slot or the device.

x3n0n · May 10, 2023

I've pulled a drive and found out that thats the one with errors. Put it in the other slot and it works normal. So it seems that one of the bays is faulty and not the drive. So good bye nas enclosure, it seems. Wrote terramaster support if they can provide another pci to sata card and am waiting for answer.

x3n0n · May 27, 2023

In the meantime i got a new system. Turns out the drive acts up in the new system also. I put the unraid flashdrive and SSDs in the new system, booted unraid and started the array, only to see the same drive dropped with the same errors in log.

I wanted to rma the drive so i wrote support and they told me to send it in. I securely erased the ssd and wanted to send it in, but was curious if it still has write errors. So i formatted it ext4 on another machine and ist started writing fine.

So i want to put it back in my unraid system and did so. Config was fine, no missing drives. But also there are no errors from the cache or btrfs, which is odd, because the secure erase has zeroed the drive.

What do i have to do to get the zeroed drive actually working back in the cache, so it has a btrfs filesystem and all the raid1 data on it?

//Edit:

Did a filesystem check on the cache in maintenance mode and now it shows errors:

[1/7] checking root items
[2/7] checking extents
[3/7] checking free space tree
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
Opening filesystem to check...
warning, device 2 is missing
Checking filesystem on /dev/sdc1
UUID: 0936923a-0844-4c80-9929-d48d3e40bfb0
found 384472600576 bytes used, no error found
total csum bytes: 371082512
total tree bytes: 919502848
total fs tree bytes: 457474048
total extent tree bytes: 63045632
btree space waste bytes: 108854723
file data blocks allocated: 386939850752
 referenced 382193192960

Edited May 27, 2023 by x3n0n
fs check

x3n0n · May 27, 2023

So for clarification:

I have a pool with 2 ssds in raid1, but the second ssd was erased. Main windows shows no config error and the fs check in maintenance mode shows 1 device missing. How do i go on from here?

- Tell unraid the second drive was zeroed

- format drive 2 with btrfs

- tell pool the second drive is not the same anymore and it has to redo the raid 1 from drive 1

JorgeB · May 28, 2023

Please post the diagnostics.

x3n0n · May 28, 2023

Booted unraid, started in maintenance and did fs check, stopped, started normally, stopped and took diagnostics:

diagnostics-20230528-1958.zip

JorgeB · May 29, 2023

With the array stopped unassign the erased device, start array, stop array, re-assign erased device, start array, that should do it, you can also post new diags to confirm.

x3n0n · May 29, 2023

Did that (also waited for the btrfs operations to finish) and it seems that everything is in order now. Both btrfs balances exited with 0. See attached.

hafen-diagnostics-20230529-1704.zip

JorgeB · May 29, 2023

Yes, looks good, and both devices are part of the pool.

(6.11.5) 2 SSD Cache (btrfs raid) dropping one ssd

Recommended Posts

x3n0n

Link to comment

x3n0n

Link to comment

JorgeB

Link to comment

x3n0n

Link to comment

JorgeB

Link to comment

x3n0n

Link to comment

x3n0n

Link to comment

x3n0n

Link to comment

JorgeB

Link to comment

x3n0n

Link to comment

JorgeB

Link to comment

x3n0n

Link to comment

JorgeB

Link to comment

Join the conversation