Jump to content

I lost one of my Cache array disks and can't start the array


Go to solution Solved by JorgeB,

Recommended Posts

Hello all.

 

I installed a new USB PCI card and when i rebooted the array wouldn't start reporting one of my cache disks was missing.

 

The cache was a RAID1 array with two 480Gb SanDisk SSD Plus drives.

 

At first i thought i'd knocked something so checked Power/Sata etc, thats all fine. I then swapped around the leads and the error repeats to the same drive, so i know its not a lead.
 

The faulty drive shows in Unassigned Devices as an odd drive (Sandisk Milpitas) @ 16.4 KB that won't format.

 

I can get the array to start by stopping the array and setting pool devices to 1 but my VM (which was on the Cache drive) is missing.

 

Can i get help with the following please?
How do i check the faulty SSD?
Fix the cache array if i buy a new identical SSD?
Change the Cache to a single drive?

 

Diagnostics attached.

 

Many thanks - i'm a bit lost!

tower-diagnostics-20231231-1239.zip

Link to comment

Hello again, thanks for the reply. Happy new year.

I did on the first post but ive just done another and its attached.

The only thing i may have done whilst checking leads is to attach the working SSD drive to a SATA connector different to it's original - but theres only two options so i can easily put it back. I hope that makes sense.

A replacement SSD is arriving tomorrow.
 

Wx

tower-diagnostics-20240101-1145.zip

Link to comment
Dec 31 15:09:48 Tower emhttpd: /mnt/cache: no btrfs or device /dev/sdg1 is not single

Looks like the pool a single slot, that won't work, stop the array, unassign the pool device, start array, stop array, change pool slots to 2, assign the pool device to the 1st slot, start array, post new diags.

  • Like 1
Link to comment

Stopped array & unassigned pool device (so it now says 'no device') but i cannot start the array without checking 'Yes, I want to do this' - so i did.

Stopped array once started, changed pool to 2 slots and restarted.

Yay, dockers and VMs are back - i can't thank you enough. I'm hoping the process of rebuilding the RAID1 is easy once the new drive arrives.

New diags attached.

What i now need to do is learn out how to backup the VMs & Dockers (apart from in RAID1)

tower-diagnostics-20240101-1320.zip

Link to comment
Jan  1 13:14:20 Tower kernel: BTRFS info (device sdg1): 451 enospc errors during balance
Jan  1 13:14:20 Tower kernel: BTRFS info (device sdg1): balance: ended with status: -28

Pool fail to balance to single due to not enough space, move/delete some data than restart the array and post new diags.

Link to comment

@itimpi Thankyou
VM Backup says its a BETA so I've just installed Appdata Backup.

 

I added an un exported share 'Appdatabackup' on the main array (turned Security to Private) and have set this as a backup location for now. Can't decide if this is good enough or i should add an Unassigned Device external drive and use that.

 

Then configured the settings to backup to /mnt/user/Appdatabackup/

 

It says its backing up:
/mnt/user/appdata
/mnt/cache/appdata

 

Should i add?
/mnt/user/domains/
/mnt/user/isos/

 

Manual backup currently running. Thankyou.

Link to comment
23 minutes ago, JorgeB said:
Jan  1 13:14:20 Tower kernel: BTRFS info (device sdg1): 451 enospc errors during balance
Jan  1 13:14:20 Tower kernel: BTRFS info (device sdg1): balance: ended with status: -28

Pool fail to balance to single due to not enough space, move/delete some data than restart the array and post new diags.

@JorgeB

sdg1 is a 'SanDisk_SSD_PLUS_480GB (sdg) 480Gb' with 107Gb free, not sure why that error is coming up.

 

Would this be as ive re-plugged the physical devices in the wrong way round after testing leads?

The faulty SSD which is currently 'Sandisk_Milpitas_SSD (sdh)' & should be the same 480Gb SSD as above, is only showing as a 16.4KB drive in unassigned devices - if thats relevant.

Link to comment

@trurl

I may revisit this as currently my cache, a RAID1 of 2x 480Gb SSDs is a bit large for a simple disk writing cache and i currently also use it for VMs & Dockers.

I've got 2x 120Gb SSDs (different brands) & a 240Gb SSD spare so may redo the disk cache as 2x 120Gb RAID1 for redundant data copying use until mover moves the data to the main array.

That will free up a 2x 480Gb & a 240Gb for VMs/dockers/backup purposes

Link to comment

Wanted to point out that i've now physically removed the faulty drive, i dont have the new one to replace it yet.

 

Stopped array, Pool devices says 'Slots 2', the first (Cache) is my 480Gb and the second (Cache 2) says 'unassigned'.

 

If i change 'Slots' to 1 which is how i assume you "remove the missing device" I then can't start the array as it says
'Wrong Pool State. cache - too many missing/wrong devices'

 

Am i doing something wrong, this is what i see?
 

Screenshot 2024-01-02 at 12.21.57.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...