trurl Posted March 3, 2021 Share Posted March 3, 2021 Do you have any VMs or Dockers that might take a long time to shutdown? Quote Link to comment
Aurial Posted March 3, 2021 Share Posted March 3, 2021 (edited) I've managed to get my cache mounted again, so leaving this here in case anyone else is struggling with this... The cache wouldn't mount and reported "Unmountable: not mounted" just like in the OP's screenshot. I resolved it by installing Unassigned Devices, then stopping the array and under cache drives I deselected my Samsung Evo drive. Next I selected it in the Unassigned Devices and clicked mount. It mounted straight away and I could browse through the filesystem on it without any problems. Then I set it back as the cache drive again, started the array and it was mounted properly. Hope this helps someone. Edit: grammar. Edited March 3, 2021 by Aurial Quote Link to comment
srfnmnk Posted March 3, 2021 Share Posted March 3, 2021 (edited) I wound up having to do the same exact thing as @Aurial with one MAJOR difference. I can mount the disk using unassigned devices with no issue, I am in the process of moving all of my data to another disk so I can format and restore. I think the issue is that the drive was encrypted -- I didn't see any release / upgrade notes specific to handling encrypted drives. This seems like an oversight? Still working on restoring / setting up secondary pool with same encryption key to see if that works, will check back in UPDATE - Posting the log output of the array startup -- as you can see, it's claiming it cannot interact with the device and/or there's a corruption but unassigned devices can interact with it just fine. UPDATE 2 - I Reformatted my cache drive (btrfs encrypted) and restored everything -- the pool still cannot mount the encrypted cache drive. UPDATE 3 - I was finally able to get my NVMe (encrypted btrfs) working as cache for my pool but I did have to mount using unassigned devices, backup everything (had remote bkup but this was way faster), add it back to the pool as cache drive, start pool, Unmountable Filesystem still, click format in the bottom to format the cache drive (be sure it's only the cache drive there and no data disks), and then load all my data back to /mnt/cache. This worked. Back up and running and all my dockers are back along with everything else on 6.9.0 unraid_cache_drive_failure.txt Edited March 4, 2021 by srfnmnk Quote Link to comment
Aurial Posted March 3, 2021 Share Posted March 3, 2021 3 minutes ago, srfnmnk said: I wound up having to do the same exact thing as @Aurial with one MAJOR difference. I can mount the disk using unassigned devices with no issue, I am in the process of moving all of my data to another disk so I can format and restore. I think the issue is that the drive was encrypted -- I didn't see any release / upgrade notes specific to handling encrypted drives. This seems like an oversight? Still working on restoring / setting up secondary pool with same encryption key to see if that works, will check back in It seems I spoke too soon. Although Unraid was saying the disk was mounted I began getting warnings from the Fix Common Problems plugging that Unraid was unable to write to the disk. I tired to use mover to move all the contents back across to the array but it hadn't moved anything after about an hour and then gave up saying it had finished. I ended up wiping the cache drive and restoring my appdata folder from a backup. I've got my dockers back up and running now. I think I've lost a couple of files from shares that hadn't been moved to the array yet by the mover, but oh well. I hope you manage to get yours up and running again. Out of interest, are you using a Samsung Evo drive as your cache? Quote Link to comment
srfnmnk Posted March 3, 2021 Share Posted March 3, 2021 1 minute ago, Aurial said: are you using a Samsung Evo drive as your cache? No I'm using a Sabranet NVMe drive. I do have Samsung_Evos in unassigned devices, but not as cache. I am thinking my issue is the fact that my cache drive was encrypted and the upgrade didn't account for that... Glad to hear you're back up and running. Quote Link to comment
a_bomb Posted March 8, 2021 Share Posted March 8, 2021 (edited) I have also been having a lot of issues with the 6.9 upgrade. Cache drive unmountable - Samsung EVO 250GB as a cache drive I ended up wiping it before seeing the solutions about stopping the array and unmounting/mounting it (lesson learned there). I seem to have got past that and re-created almost everything I needed. I had a lot of issues with it rebooting on its own as well. Usually after about 30-40 minutes and it seemed like it would just keep doing it until I reverted back to 6.8.3. I had 6.9 stable for over 7 hours today and then started to bring back my VMs (had to recreate domains share and libvirt folder) and the reboots just started up again which also caused a parity check. It just did it again, while typing this, only up for 9 minutes before a reboot this time. skynet-diagnostics-20210308-0015.zip Edited March 8, 2021 by a_bomb Quote Link to comment
trurl Posted March 8, 2021 Share Posted March 8, 2021 Possibly unrelated but your appdata and system shares have files on the array. You should clean that up. What do you get from the command line with this? ls -lah /mnt/disk23/system Quote Link to comment
a_bomb Posted March 8, 2021 Share Posted March 8, 2021 (edited) 30 minutes ago, trurl said: Possibly unrelated but your appdata and system shares have files on the array. You should clean that up. What do you get from the command line with this? ls -lah /mnt/disk23/system root@Skynet:~# ls -lah /mnt/disk23/system total 0 drwxrwxrwx 3 nobody users 20 Mar 3 21:15 ./ drwxrwxrwx 8 nobody users 155 Mar 7 23:50 ../ drwxrwxrwx 2 root root 24 Mar 3 21:15 docker/ So I should go ahead and move the docker/ back to the cache drive? Or just delete it? Seems there is an appdata folder on disk 23. I could just remove that since all of that data should be on the cache drive. Edited March 8, 2021 by a_bomb Quote Link to comment
trurl Posted March 8, 2021 Share Posted March 8, 2021 Probably better to just delete and recreate docker.img Settings - Docker, disable then delete docker.img on that same page. Enable will recreate it on cache as specified. Then you can reinstall your containers exactly as they were with the Previous Apps feature on the Apps page. Quote Link to comment
a_bomb Posted March 8, 2021 Share Posted March 8, 2021 Thanks. I will go ahead and do that. Then if it crashes again I guess I will just roll back to 6.8.3 again. Quote Link to comment
a_bomb Posted March 12, 2021 Share Posted March 12, 2021 (edited) Well I got up to 11 hours after doing the above steps and then upgrading to 6.9.1 I was tailing the syslog and got this before it shutdown. Going by the timestamps, it rebooted more than once. This is just what was there on the terminal I had open last night. There seem to be memory errors for sure, but it also looks like it is handling them? I'm not sure how I would go about pulling the exact sticks short of assuming Channel 0 = Channel A etc. or pulling them one by one and checking the log. I'm thinking about pulling the 1050 as well Mar 12 00:49:33 Skynet kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x6a751a offset:0x340 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0093 socket:0 ha:0 channel_mask:8 rank:1) Mar 12 01:26:41 Skynet kernel: mce: [Hardware Error]: Machine check events logged Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: 8c00004000010093 Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: TSC 1a5c2d76dc3d Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: ADDR 727108f40 Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: MISC 1424a5c86 Mar 12 01:26:41 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615530401 SOCKET 1 APIC 20 Mar 12 01:26:41 Skynet kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#1_DIMM#1 (channel:1 slot:1 page:0x727108 offset:0xf40 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:2 rank:5) Mar 12 01:34:45 Skynet kernel: mce: [Hardware Error]: Machine check events logged Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: 8c00004000010093 Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: TSC 1b80e90da98b Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: ADDR d65da23c0 Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: MISC 425a4686 Mar 12 01:34:45 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615530885 SOCKET 1 APIC 20 Mar 12 01:34:45 Skynet kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0xd65da2 offset:0x3c0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:8 rank:1) Mar 12 01:51:10 Skynet kernel: mce: [Hardware Error]: Machine check events logged Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: HANDLING MCE MEMORY ERROR Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 7: 8c00004000010093 Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: TSC 1dd51b124c58 Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: ADDR 68663a340 Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: MISC 1526a5886 Mar 12 01:51:10 Skynet kernel: EDAC sbridge MC0: PROCESSOR 0:306e4 TIME 1615531870 SOCKET 0 APIC 0 Mar 12 01:51:10 Skynet kernel: EDAC MC0: 1 CE memory read error on CPU_SrcID#0_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x68663a offset:0x340 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0093 socket:0 ha:0 channel_mask:8 rank:1) Mar 12 01:59:03 Skynet kernel: mce: [Hardware Error]: Machine check events logged Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: 8c00004000010093 Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: TSC 1ef3720de11e Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: ADDR 7289aeb00 Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: MISC 4214e486 Mar 12 01:59:03 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615532343 SOCKET 1 APIC 20 Mar 12 01:59:03 Skynet kernel: EDAC MC1: 1 CE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#1 (channel:0 slot:1 page:0x7289ae offset:0xb00 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:1 rank:5) Mar 12 02:00:06 Skynet root: /etc/libvirt: 920.4 MiB (965103616 bytes) trimmed on /dev/loop3 Mar 12 02:00:06 Skynet root: /var/lib/docker: 15.5 GiB (16609398784 bytes) trimmed on /dev/loop2 Mar 12 02:00:06 Skynet root: /mnt/cache: 191.9 GiB (206013878272 bytes) trimmed on /dev/sdj1 Connection reset by 192.168.1.63 port 22 Edited March 12, 2021 by a_bomb Quote Link to comment
JorgeB Posted March 12, 2021 Share Posted March 12, 2021 Check the System Event Log in the BIOS or over IPMI if available, it might have more info on the affected DIMM. Quote Link to comment
a_bomb Posted March 13, 2021 Share Posted March 13, 2021 I didn't seem to find it there. I've taken out all but 4 sticks just whittling it down until I don't see the errors anymore. Seems it crashed and rebooted again this morning (last log entries before that happened below). I went ahead and took out the 1050 just a moment ago as well. Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: cc0007c000010093 Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: TSC 93f4c92e09d4 Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: ADDR 5ae239040 Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: MISC 403e0486 Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615625437 SOCKET 1 APIC 20 Mar 13 03:50:37 Skynet kernel: EDAC MC1: 31 CE memory read error on CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x5ae239 offset:0x40 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:8 rank:1) Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: HANDLING MCE MEMORY ERROR Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: CPU 8: Machine Check Event: 0 Bank 7: cc00078000010093 Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: TSC 93f4c92e889c Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: ADDR 5b17cbd80 Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: MISC 52768086 Mar 13 03:50:37 Skynet kernel: EDAC sbridge MC1: PROCESSOR 0:306e4 TIME 1615625437 SOCKET 1 APIC 20 Mar 13 03:50:37 Skynet kernel: EDAC MC1: 30 CE memory read error on CPU_SrcID#1_Ha#0_Chan#3_DIMM#0 (channel:3 slot:0 page:0x5b17cb offset:0xd80 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0093 socket:1 ha:0 channel_mask:8 rank:1) Connection reset by 192.168.1.63 port 22 Quote Link to comment
Benemortasia Posted January 24, 2023 Share Posted January 24, 2023 On 3/3/2021 at 9:29 PM, Aurial said: Just to add, I have the exact same drive as my cache and it's also unmountable when the server came back up from reboot after upgrading. I can add to this. I just did an upgrade to 6.11 just now, also having a 1TB Samsung EVO. Same happened. Currently looking around for the recent solutions provided. Quote Link to comment
trurl Posted January 24, 2023 Share Posted January 24, 2023 22 minutes ago, Benemortasia said: upgrade to 6.11 Upgrade from? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.