x3n0n Posted May 10, 2023 Share Posted May 10, 2023 Hi there folks, i'm running unraid in a terramaster 2-bay nas enclosure and its been fun! Sadly my new config is running for some 2 months now and one ssd is dropping from the redundant cache already. I don't know if it is the terramaster enclosure or if its a faulty ssd. I don't want to switch the ssds in the bay. If it is a connection issue, i have write errors on both ssds. I'm running mostly from cache, for main storage i have a 32gb usb flash drive. I attached the diagnostics. Can somebody point me in the right direction? wkr diagnostics-20230510-1915.zip Quote Link to comment
x3n0n Posted May 10, 2023 Author Share Posted May 10, 2023 Around line 1573 in syslog it says May 10 19:01:59 Hafen kernel: ata2.00: exception Emask 0x0 SAct 0x100000 SErr 0x0 action 0x6 frozen May 10 19:01:59 Hafen kernel: ata2.00: failed command: READ FPDMA QUEUED May 10 19:01:59 Hafen kernel: ata2: hard resetting link May 10 19:02:04 Hafen kernel: ata2: link is slow to respond, please be patient (ready=0) May 10 19:02:09 Hafen kernel: ata2: COMRESET failed (errno=-16) May 10 19:02:59 Hafen kernel: ata2: reset failed, giving up May 10 19:02:59 Hafen kernel: ata2.00: disable device Sounds like a ssd thats getting dropped, but i don't know why. Quote Link to comment
JorgeB Posted May 10, 2023 Share Posted May 10, 2023 This is usually a power/connection problem, is the SSD connected with cables or direct? Quote Link to comment
x3n0n Posted May 10, 2023 Author Share Posted May 10, 2023 It is connected directly with the nas enclosure. ive made sure it sits securely in the slot. Quote Link to comment
JorgeB Posted May 10, 2023 Share Posted May 10, 2023 Swap slots with a different device if possible, then if it happens again see if the problems follows the slot or the device. Quote Link to comment
x3n0n Posted May 10, 2023 Author Share Posted May 10, 2023 I've pulled a drive and found out that thats the one with errors. Put it in the other slot and it works normal. So it seems that one of the bays is faulty and not the drive. So good bye nas enclosure, it seems. Wrote terramaster support if they can provide another pci to sata card and am waiting for answer. Quote Link to comment
x3n0n Posted May 27, 2023 Author Share Posted May 27, 2023 (edited) In the meantime i got a new system. Turns out the drive acts up in the new system also. I put the unraid flashdrive and SSDs in the new system, booted unraid and started the array, only to see the same drive dropped with the same errors in log. I wanted to rma the drive so i wrote support and they told me to send it in. I securely erased the ssd and wanted to send it in, but was curious if it still has write errors. So i formatted it ext4 on another machine and ist started writing fine. So i want to put it back in my unraid system and did so. Config was fine, no missing drives. But also there are no errors from the cache or btrfs, which is odd, because the secure erase has zeroed the drive. What do i have to do to get the zeroed drive actually working back in the cache, so it has a btrfs filesystem and all the raid1 data on it? //Edit: Did a filesystem check on the cache in maintenance mode and now it shows errors: [1/7] checking root items [2/7] checking extents [3/7] checking free space tree [4/7] checking fs roots [5/7] checking only csums items (without verifying data) [6/7] checking root refs [7/7] checking quota groups skipped (not enabled on this FS) Opening filesystem to check... warning, device 2 is missing Checking filesystem on /dev/sdc1 UUID: 0936923a-0844-4c80-9929-d48d3e40bfb0 found 384472600576 bytes used, no error found total csum bytes: 371082512 total tree bytes: 919502848 total fs tree bytes: 457474048 total extent tree bytes: 63045632 btree space waste bytes: 108854723 file data blocks allocated: 386939850752 referenced 382193192960 Edited May 27, 2023 by x3n0n fs check Quote Link to comment
x3n0n Posted May 27, 2023 Author Share Posted May 27, 2023 So for clarification: I have a pool with 2 ssds in raid1, but the second ssd was erased. Main windows shows no config error and the fs check in maintenance mode shows 1 device missing. How do i go on from here? - Tell unraid the second drive was zeroed - format drive 2 with btrfs - tell pool the second drive is not the same anymore and it has to redo the raid 1 from drive 1 Quote Link to comment
JorgeB Posted May 28, 2023 Share Posted May 28, 2023 Please post the diagnostics. Quote Link to comment
x3n0n Posted May 28, 2023 Author Share Posted May 28, 2023 Booted unraid, started in maintenance and did fs check, stopped, started normally, stopped and took diagnostics: diagnostics-20230528-1958.zip Quote Link to comment
Solution JorgeB Posted May 29, 2023 Solution Share Posted May 29, 2023 With the array stopped unassign the erased device, start array, stop array, re-assign erased device, start array, that should do it, you can also post new diags to confirm. 1 Quote Link to comment
x3n0n Posted May 29, 2023 Author Share Posted May 29, 2023 Did that (also waited for the btrfs operations to finish) and it seems that everything is in order now. Both btrfs balances exited with 0. See attached. hafen-diagnostics-20230529-1704.zip Quote Link to comment
JorgeB Posted May 29, 2023 Share Posted May 29, 2023 Yes, looks good, and both devices are part of the pool. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.