Replace a single SSD in Raid 1 Cache Pool


Recommended Posts

Hi there,

 

i'm sill evaluating Unraid and I'm going to buy it soon. I've already made a lot of experiences, my USB stick broke, I've created a new one without backup and got everything working again. I tried to mount my single disks in the array with all the different file systems and found out XFS and BTRFs it the option. I wasn't able to mount a single ZFS disk but anyways...so far everything went fine and I learned a lot.

 

BUT..... had a lot of BTRFs Cache errors on NVMe in syslog. The second cache is SATA3 and was always finde. So I ordered a new NVMe Cache disk but I was not able to replace it. Whatever I tried... it hung up, scrambled the screen with the new empty NVMe SSD and so on.

 

Then I noticed I wasn't able to mount the second SATA3 Cache disk. Whatever  tried I wasn't able to get it to work.

I now finally built in the old NVME SSD and....it's working again. I've backed up and moved everything to the array now.

 

That brings me to the following question: What is the benefit of an RAID 1 BTRS cache Pool of two devices if I can't access, mount or replace a single SSD of the raid 1 pool? What if the old one I wanted to replace today on purpose instead broke? Am I lost then?
 

Then I don't need a mirror at all. 

 

I hope we can sort this issue so I'm prepared for all cases which might come.
I'm willing to test, reformat, share my results or would also use your suggested best practice if you have one.

 

kind regads,

PhantombrainM

 

P.S.: There seems no documentation for replacing a cache raid pool. All links I found are broken or gone.

Link to comment

Thank you so far. I've read it all. It seems there is a problem, maybe an Unraid bug. Not sure yet.

 

1. I installed the old NVMe and 2nd SATA3 Cache Disk

 

1. Cache : NVME OLD

2. Cache 2: SATA3 SSD

 

-> Unraid booted again. I did a scrub and corrected errors. I stopped the array and unassigned Disk 1:

 

1. Cache: Unassigned

2. Cache 2: SATA3 SSD

 

-> Now I was able to start the array and cache disk. It all was fine! I shut down the Server and build in the new NVME SSD.

 

After a reboot I tried to start the array with cache disk 1 unassigned and cache2: SATA 3 SSD as before -> everything was fine!

I stopped the array and assigned the new NVME

 

1. Cache: NEW NVME <- Message "This will format and delete all data on that device" or something similar.

2. Cache 2: SATA3 SSD

 

I started the Array and it all messed up again. Console showed garbeld stuff like in Matrix Movie and the Unraid hung up completly.

Syslog shows Trace Errors.

 

P.S.: I found out, that it's random. The new Emtec X300 500GB SSD seems unstable or incompatible.  Even when I add a new pool and just format the Emtec as XFS unraid also hang up, console on VGA goes crazy and syslog shows errors.

 

Tomorrow will boot Ubuntu to see if it's Unraid or a general HW problem.

 

 

Link to comment

Okay, it's an Unraid bug. The Emtec X300 500GB NVMe seems not compatible.

I've booted Ubuntu. It can format and access the Emtec. I've also overwritten it complety with zeros and put files on it.

 

When I start Unraid and assign the NVME to any new pools, whether it's XFS, BTRFS, ZFS Unraid starts tracelogs and freezes as soon as I check and hit "Yes, Format unmountable devices". It's reproducible.

Maybe Kernel, Fuse or Unraid itself. And of course I did the smart tests. Even UEFI Smart test. They're all okay. All components are brand new.  (about a monh).

 

And yes, I did Memtest 28 days ago when I built everything up. It took ages. I've not repeated it yet since Unraid was / is rock stable with the other components and runs a lot of docker stuff too which heavily relies on Memory and CPU.

 

Is there anything I can do for the devs to debug, logfiles etc. ? Is anyone interested in that to investigate? I'm willing to help. I'm working as an IT Administrator in real life and I'm experienced in Linux shell stuff too.

 

Link to comment

@JorgeB Here is the Syslog right after I checked "Yes, Format unmountable devices" and clicked Format and OK. 

I have to hard reset because Unraid freezes. As said with Ubuntu it's working fine. A different NVMe is also working with my Unraid but has some general problems (on all systems) so I need to replace it. So it's not my hardware in general I think.

Unraid freeze Format NVMe.rtf

Link to comment

I see.  Also very recent.

Now I wonder (I haven't tried yet) if my still working old NVMe will also fail to format. I've created it with Unraid 6.12.7

 

And replaced it with .8 and Docker fix plugin. That's all I changed on software side. Maybe I will try it tomorrow.

Edited by PhantombrainM
Link to comment
  • 1 month later...

Unraid 6.12.9 has been released with a new kernal. But it's still 6.1.x just with a higher x number. So I think it's just fixes?

 

Is there a way to start only the pool to test to format my NVMe? I don't want to try it again because when Unraid freezes, I have to check parity again and it always takes round about 3 days.

 

I've also noticed this yellow line in the logs when starting up:

kernel: nvme 0000:04:00.0: VPD access failed.  This is likely a firmware bug on this device.  Contact the card vendor for a firmware update

 

kind regards,
PhantombrainM

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.