Upgraded To 6.9 & My Cache Drive is Not Mountable

trurl · March 3, 2021

Do you have any VMs or Dockers that might take a long time to shutdown?

Aurial · March 3, 2021

I've managed to get my cache mounted again, so leaving this here in case anyone else is struggling with this...

The cache wouldn't mount and reported "Unmountable: not mounted" just like in the OP's screenshot.

I resolved it by installing Unassigned Devices, then stopping the array and under cache drives I deselected my Samsung Evo drive.

Next I selected it in the Unassigned Devices and clicked mount. It mounted straight away and I could browse through the filesystem on it without any problems. Then I set it back as the cache drive again, started the array and it was mounted properly.

Hope this helps someone.

Edit: grammar.

Edited March 3, 2021 by Aurial

srfnmnk · March 3, 2021

I wound up having to do the same exact thing as @Aurial with one MAJOR difference.

I can mount the disk using unassigned devices with no issue, I am in the process of moving all of my data to another disk so I can format and restore. I think the issue is that the drive was encrypted -- I didn't see any release / upgrade notes specific to handling encrypted drives. This seems like an oversight?

Still working on restoring / setting up secondary pool with same encryption key to see if that works, will check back in

UPDATE - Posting the log output of the array startup -- as you can see, it's claiming it cannot interact with the device and/or there's a corruption but unassigned devices can interact with it just fine.

UPDATE 2 - I Reformatted my cache drive (btrfs encrypted) and restored everything -- the pool still cannot mount the encrypted cache drive.

UPDATE 3 - I was finally able to get my NVMe (encrypted btrfs) working as cache for my pool but I did have to mount using unassigned devices, backup everything (had remote bkup but this was way faster), add it back to the pool as cache drive, start pool, Unmountable Filesystem still, click format in the bottom to format the cache drive (be sure it's only the cache drive there and no data disks), and then load all my data back to /mnt/cache. This worked. Back up and running and all my dockers are back along with everything else on 6.9.0

unraid_cache_drive_failure.txt

Edited March 4, 2021 by srfnmnk

Aurial · March 3, 2021

3 minutes ago, srfnmnk said:

I wound up having to do the same exact thing as @Aurial with one MAJOR difference.

I can mount the disk using unassigned devices with no issue, I am in the process of moving all of my data to another disk so I can format and restore. I think the issue is that the drive was encrypted -- I didn't see any release / upgrade notes specific to handling encrypted drives. This seems like an oversight?

Still working on restoring / setting up secondary pool with same encryption key to see if that works, will check back in

It seems I spoke too soon.

Although Unraid was saying the disk was mounted I began getting warnings from the Fix Common Problems plugging that Unraid was unable to write to the disk. I tired to use mover to move all the contents back across to the array but it hadn't moved anything after about an hour and then gave up saying it had finished.

I ended up wiping the cache drive and restoring my appdata folder from a backup. I've got my dockers back up and running now. I think I've lost a couple of files from shares that hadn't been moved to the array yet by the mover, but oh well.

I hope you manage to get yours up and running again.

Out of interest, are you using a Samsung Evo drive as your cache?

srfnmnk · March 3, 2021

1 minute ago, Aurial said:

are you using a Samsung Evo drive as your cache?

No I'm using a Sabranet NVMe drive. I do have Samsung_Evos in unassigned devices, but not as cache. I am thinking my issue is the fact that my cache drive was encrypted and the upgrade didn't account for that...

Glad to hear you're back up and running.

a_bomb · March 8, 2021

I have also been having a lot of issues with the 6.9 upgrade.

Cache drive unmountable - Samsung EVO 250GB as a cache drive

I ended up wiping it before seeing the solutions about stopping the array and unmounting/mounting it (lesson learned there). I seem to have got past that and re-created almost everything I needed.

I had a lot of issues with it rebooting on its own as well. Usually after about 30-40 minutes and it seemed like it would just keep doing it until I reverted back to 6.8.3.

I had 6.9 stable for over 7 hours today and then started to bring back my VMs (had to recreate domains share and libvirt folder) and the reboots just started up again which also caused a parity check.

It just did it again, while typing this, only up for 9 minutes before a reboot this time.

skynet-diagnostics-20210308-0015.zip

Edited March 8, 2021 by a_bomb

trurl · March 8, 2021

Possibly unrelated but your appdata and system shares have files on the array. You should clean that up.

What do you get from the command line with this?

ls -lah /mnt/disk23/system

a_bomb · March 8, 2021

30 minutes ago, trurl said:
Possibly unrelated but your appdata and system shares have files on the array. You should clean that up.

What do you get from the command line with this?
ls -lah /mnt/disk23/system

root@Skynet:~# ls -lah /mnt/disk23/system
total 0
drwxrwxrwx 3 nobody users 20 Mar 3 21:15 ./
drwxrwxrwx 8 nobody users 155 Mar 7 23:50 ../
drwxrwxrwx 2 root root 24 Mar 3 21:15 docker/

So I should go ahead and move the docker/ back to the cache drive? Or just delete it? Seems there is an appdata folder on disk 23. I could just remove that since all of that data should be on the cache drive.

Edited March 8, 2021 by a_bomb

trurl · March 8, 2021

Probably better to just delete and recreate docker.img

Settings - Docker, disable then delete docker.img on that same page. Enable will recreate it on cache as specified.

Then you can reinstall your containers exactly as they were with the Previous Apps feature on the Apps page.

a_bomb · March 8, 2021

Thanks. I will go ahead and do that. Then if it crashes again I guess I will just roll back to 6.8.3 again.

a_bomb · March 12, 2021

Well I got up to 11 hours after doing the above steps and then upgrading to 6.9.1

I was tailing the syslog and got this before it shutdown. Going by the timestamps, it rebooted more than once. This is just what was there on the terminal I had open last night. There seem to be memory errors for sure, but it also looks like it is handling them? I'm not sure how I would go about pulling the exact sticks short of assuming Channel 0 = Channel A etc. or pulling them one by one and checking the log.

I'm thinking about pulling the 1050 as well

Mar 12 00:49:33 Skynet kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x6a751a offset:0x340 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0093 socket:0 ha:0 channel_mask:8 rank:1)
Mar 12 01:26:41 Skynet kernel: mce: [Hardware Error]: Machine check events logged
Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: 8c00004000010093
Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: TSC 1a5c2d76dc3d
Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: ADDR 727108f40
Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: MISC 1424a5c86
Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615530401 SOCKET 1 APIC 20
Mar 12 01:26:41 Skynet kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#1_DIMM#1 (channel:1 slot:1 page:0x727108 offset:0xf40 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:2 rank:5)
Mar 12 01:34:45 Skynet kernel: mce: [Hardware Error]: Machine check events logged
Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: 8c00004000010093
Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: TSC 1b80e90da98b
Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: ADDR d65da23c0
Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: MISC 425a4686
Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615530885 SOCKET 1 APIC 20
Mar 12 01:34:45 Skynet kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0xd65da2 offset:0x3c0 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:8 rank:1)
Mar 12 01:51:10 Skynet kernel: mce: [Hardware Error]: Machine check events logged
Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010093
Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: TSC 1dd51b124c58
Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: ADDR 68663a340
Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: MISC 1526a5886
Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1615531870 SOCKET 0 APIC 0
Mar 12 01:51:10 Skynet kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x68663a offset:0x340 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0093 socket:0 ha:0 channel_mask:8 rank:1)
Mar 12 01:59:03 Skynet kernel: mce: [Hardware Error]: Machine check events logged
Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: 8c00004000010093
Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: TSC 1ef3720de11e
Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: ADDR 7289aeb00
Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: MISC 4214e486
Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615532343 SOCKET 1 APIC 20
Mar 12 01:59:03 Skynet kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#1 (channel:0 slot:1 page:0x7289ae offset:0xb00 grain:32 syndrome:0x0 -  area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:1 rank:5)
Mar 12 02:00:06 Skynet root: /etc/libvirt: 920.4 MiB (965103616 bytes) trimmed on /dev/loop3
Mar 12 02:00:06 Skynet root: /var/lib/docker: 15.5 GiB (16609398784 bytes) trimmed on /dev/loop2
Mar 12 02:00:06 Skynet root: /mnt/cache: 191.9 GiB (206013878272 bytes) trimmed on /dev/sdj1
Connection reset by 192.168.1.63 port 22

Edited March 12, 2021 by a_bomb

JorgeB · March 12, 2021

Check the System Event Log in the BIOS or over IPMI if available, it might have more info on the affected DIMM.

a_bomb · March 13, 2021

I didn't seem to find it there. I've taken out all but 4 sticks just whittling it down until I don't see the errors anymore. Seems it crashed and rebooted again this morning (last log entries before that happened below).

I went ahead and took out the 1050 just a moment ago as well.

Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: cc0007c000010093
Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: TSC 93f4c92e09d4
Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: ADDR 5ae239040
Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: MISC 403e0486
Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615625437 SOCKET 1 APIC 20
Mar 13 03:50:37 Skynet kernel: EDAC MC1: 31 CE memory read error on CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x5ae239 offset:0x40 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:8 rank:1)
Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR
Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: cc00078000010093
Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: TSC 93f4c92e889c
Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: ADDR 5b17cbd80
Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: MISC 52768086
Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615625437 SOCKET 1 APIC 20
Mar 13 03:50:37 Skynet kernel: EDAC MC1: 30 CE memory read error on CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x5b17cb offset:0xd80 grain:32 syndrome:0x0 -  OVERFLOW area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:8 rank:1)
Connection reset by 192.168.1.63 port 22

Benemortasia · January 24, 2023

On 3/3/2021 at 9:29 PM, Aurial said:

Just to add, I have the exact same drive as my cache and it's also unmountable when the server came back up from reboot after upgrading.

I can add to this.
I just did an upgrade to 6.11 just now, also having a 1TB Samsung EVO. Same happened.

Currently looking around for the recent solutions provided.

trurl · January 24, 2023

22 minutes ago, Benemortasia said:

upgrade to 6.11

Upgrade from?

Upgraded To 6.9 & My Cache Drive is Not Mountable

Recommended Posts

trurl

Link to comment

Aurial

Link to comment

srfnmnk

Link to comment

Aurial

Link to comment

srfnmnk

Link to comment

a_bomb

Link to comment

trurl

Link to comment

a_bomb

Link to comment

trurl

Link to comment

a_bomb

Link to comment

a_bomb

Link to comment

JorgeB

Link to comment

a_bomb

Link to comment

Benemortasia

Link to comment

trurl

Link to comment

Join the conversation