Jump to content

Cache disc 1 not installed, Shares unprotected


Go to solution Solved by JorgeB,

Recommended Posts

49 minutes ago, JorgeB said:

Is there supposed to be a cache1 NVMe device? If yes it's not being detected on a hardware level.

Thats the point. I have two identical WD NVMe drives set to a Cache Pool. Running since more than one year. Now Cache 1 drive is no longer detected ...

Link to comment
  • 2 weeks later...

Hello, had some big problems with my machine. I`ve got a Enermax AIO watercooler and this little guy just damaged.

Was wondering my server was shut down. Then I started it, and just after booting and going to the Dashboard I've seen my lost Cache NVMe. But it was listed in UD 😀 (Yeah its alive)

Then I saw my Cache is: Unmountable: no filesystem.

 

After I realized this, the hole machine was hardly shut down. Went to the cellar and realized "Yes, my Server is offline." Took it, and connected it to a monitor to check the BIOS. Ohh hell. CPU Temp went in Seconds from 30°C to something 109°C and then it shut down. Checked the watercooler, disassembled it, cleaned the CPU and cooler block, applied new thermal paste and did another boot up. Same result.

Now I changed to the boxed air cooler and e voila, CPU is chilling at 42°C 😇

 

But what now to do with my broken Cache pool. Disc 2 is still part of the pool, but disc 1 is positioned at unassigned devices. Shoul I still check if NVMe 1 is broken ? SMART doesn't show any error.

grafik.thumb.png.256d911e1388149e0d0c963bac98dbc8.png

 

Only Information I found was for Dev 1 (Cache Pool NVME 1):

grafik.thumb.png.d5a8c0bad01f500a59ceee3e7cc6d26d.png

 

How can I restore the whole thing. All my Docker and Download Temp is/was stored in the Cache Pool ... 😪

Link to comment
6 hours ago, JorgeB said:

Please post the diagnostics.

Hey @JorgeB,

 

two diagnostics.

 

First after fresh reboot with array stopped (...-1536.zip)

Second with manually starting the array (...-1537.zip)

 

Dashboard Screen:

grafik.thumb.png.ce4aa47d8d2847a1c01466b965c29d20.png

 

I hope someone could help finding a solution for rescueing oder rebuilding my cache pool.

 

 

apollon-diagnostics-20220511-1536.zip apollon-diagnostics-20220511-1537.zip

Link to comment
May 11 15:36:33 Apollon kernel: BTRFS error (device nvme0n1p1): super_num_devices 1 mismatch with num_devices 1 found here

 

The superblock is corrupt, and the other pool member is way out of sync, though that's expected if it dropped offline earlier, you can try a backup superblock to see if it works, stop the array, type:

 

btrfs-select-super -s 1 /dev/nvme0n1p1

 

Then reboot and post new diags.

Link to comment
16 minutes ago, JorgeB said:

You can try this, physically disconnect the other NVMe device, the one that is currently unassigned and try again.

Okay, will try it. Not that nice, since the NVMes are stored under a heavy heat spreader @ my Mainborad. But hopefully it will work after that 🤗

Link to comment

Hey, back again.

 

Did not have much time the last days, but I think I have partially some good news.

 

Disconnected the NVMe placed under UD and startet my rig.

 

grafik.thumb.png.d17bc02dcfd56614a9ec70ec96f87a13.png

 

But what to do now?

  1. Delete dev1 under Historical Devices ? (So I think when I put it back in again, unraid shouldn't places it under UD again)
  2. Stop Array, Set Number of Discs of Cachepool to 1 and backup my Data/move it to the Array ?

Or Should I now backup/move my Data to the Array ?

 

I think first I'll wait for @JorgeB to look at my diagnostics if everything is OK.

apollon-diagnostics-20220520-1035.zip

Link to comment
  • Solution

That's good news!

 

UD historical devices don't really matter for this, but you can remove it now or later, I assume you plan do re-add the other device o the pool?

 

If yes first make sure backups are up to date, then you'll need to wipe the other device before adding it back to the pool, you can do it like this:

 

-check that array auto start is disable, shutdown server

-reconnect the other NVMe device

-power on the server, don't start the array

-wipe the unassigned device with

blkdiscard /dev/nvme#n1

Replace # with correct number, not sure if 6.9.2 needs -f for blkdiscard if a data is detected, if yes use it.

-assign it back to the pool

-start array to begin balance

 

Link to comment
12 minutes ago, JorgeB said:

(...) I assume you plan do re-add the other device o the pool?

Thats correct.

 

12 minutes ago, JorgeB said:

If yes first make sure backups are up to date

Is it enough to change the share settings from "Cache: only" to "yes" and start the mover ? I'll also shutdown any docker/VM from the settings tab. Maybe also some CA Backup. Some temporary files are not a big deal to lose them, since I can get them back. But docker instances with all the settings would be horror to me.

 

12 minutes ago, JorgeB said:

 

-check that array auto start is disable, shutdown server

-reconnect the other NVMe device

-power on the server, don't start the array

-wipe the unassigned device with

blkdiscard /dev/nvme#n1

Replace # with correct number, not sure if 6.9.2 needs -f for blkdiscard if a data is detected, if yes use it.

-assign it back to the pool

-start array to begin balance

 

After I added the freshly wiped NVMe back to the cache pool just setting up the old cache shares to "Only" and starting the mover ?

Link to comment

IMHO moving everything to the array and back is overkill, just make sure anything important like appdata is backed up, you should always have backups of anything important, redundancy is not a substitute, when you add the device it will keep the existing pool data, and you don't even need to shutdown docker/VMs, they can be online, data is just replicated to the other device.

Link to comment
2 hours ago, JorgeB said:

(...)

-wipe the unassigned device with

blkdiscard /dev/nvme#n1

Replace # with correct number, not sure if 6.9.2 needs -f for blkdiscard if a data is detected, if yes use it.

Had to use it with "-f"

 

2 hours ago, JorgeB said:

-assign it back to the pool

Now the drive is still listed under UD. Just add it back to the cache pool, or hit the Format button before?

grafik.thumb.png.8ca15c59d6ff81dbff63250c963984fe.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...