-=SOLVED=- UNRAID blew up, safe mode kinda works, now what? AKA how to recover UNRAID?


Juise99

Recommended Posts

So I'm new to UNRAID, but not NAS, Linux, FIle System RAID, or any of the other underlying parts that make this work. I'm Giving UNRAID a shot because S2D has been lack luster performance wise to say the least. Fresh hardware R3 3200G, Asus ROG STRIX B450 mobo, 8GB RAM, 2 port 10-Gigabit Intel X540, Marvel 88SE9215 6 port SATA controller, 7 4tb drives and a 512GB SSD Cache.

 

Everything has been up running great for 15 days. Fresh install of 6.7.2, 1 SMB Share, Community applications plugin, Plex Docker (Plexinc), tautalli Docker (linuxserver), Disk Location (Ole-Henrik Jakobson), Preclear Disk (gfjardmin), Unassigned Devices (dlandon), and SNMP (KZ) all installed & updated. About two hours ago I added 6 movies to Plex via Radarr (which runs on anther box), once those finished I manually kicked off the mover, and updated my Plex library. Then BOOM! Plex goes down, I restart the docker but it cant load any media. Reboot UNRAID, now I can't even start the array because the unassigned devices plugin wont stop refreshing. Restart again, no webui (locally or remote). So now if I reboot into safe mode the array starts, smb works, the Plex docker starts but has no access to my media (host path is there and you can see the media in the docker cli). So now, how can I use safe mode to wipe everything short of the array and start over? Or is there something else I should be looking at?

Edited by Juise99
Solved
Link to comment

I don't see anything obvious, except your system share has files on the array instead of cache where they belong. Possibly you created docker and/or libvirt image before you added cache so they got created on the array. You don't mention any VMs, do you have any?

 

Docker image isn't full now but maybe you overfilled it in the past, so I guess it's possible you have docker image corruption, but syslog doesn't have much after the reboot so can't really tell anything from that.

 

Didn't take the time to look at SMART for all of your disks. Are you getting any SMART warnings on the Dashboard?

 

You might delete docker image and recreate it so it will be on cache. Apps - Previous Apps will add your dockers back just as they were.

 

I'm not familiar with some of those plugins but CA and UD should be fine. Maybe try running without the others for a while.

 

Setup Syslog Server so you can retain syslogs after rebooting and maybe we can tell more if you continue to have problems.

  • Thanks 1
Link to comment
3 hours ago, trurl said:

1) I don't see anything obvious, except your system share has files on the array instead of cache where they belong. Possibly you created docker and/or libvirt image before you added cache so they got created on the array. You don't mention any VMs, do you have any?

 

2) Docker image isn't full now but maybe you overfilled it in the past, so I guess it's possible you have docker image corruption, but syslog doesn't have much after the reboot so can't really tell anything from that.

 

3) Didn't take the time to look at SMART for all of your disks. Are you getting any SMART warnings on the Dashboard?

 

You might delete docker image and recreate it so it will be on cache. Apps - Previous Apps will add your dockers back just as they were.

 

I'm not familiar with some of those plugins but CA and UD should be fine. Maybe try running without the others for a while.

 

Setup Syslog Server so you can retain syslogs after rebooting and maybe we can tell more if you continue to have problems.

1) I switched my cache from from 2 240's to a single 512GB about a week ago possible those files got created then. No VM's

 

2) With just plex and tautalli that's highly unlikely

 

3) SMART looks good for all the drives on the system.

 

How do I remove everything (dockers, plugins) from normal boot environment while in safe mode?

Link to comment

So I moved the docker & libvirt img files in /mnt/disk1/system/ to /mnt/cache/system

I moved everything (except for dockerMan & dynamix) in /boot/config/plugins to /boot/config/plugins-removed

I renamed /mnt/cache/appdata/PlexMediaServer/ Library to library.old

 

Things appear to be back to normal. I'll just start over and re-install everything docker and plugin wise.

 

I re-installed the Plex docker. I had to re-add my Plex libraries, because they wouldn't load (hence my renaming the library directory).

 

Anything I missed?

 

Link to comment
19 hours ago, Juise99 said:

Marvel 88SE9215 6 port SATA controller

That's a nasty combination of a buggy chip and a port multiplier, all on a single PCIe lane. It will cause you a lot of problems as they are known to drop disks at random times. Either get an LSI-based SAS controller (which will need a x8 slot) or use all the motherboard SATA ports and buy an ASMedia 1061 or 1062-based dual port SATA controller for the extras.

  • Thanks 1
Link to comment
1 hour ago, John_M said:

That's a nasty combination of a buggy chip and a port multiplier, all on a single PCIe lane. It will cause you a lot of problems as they are known to drop disks at random times. Either get an LSI-based SAS controller (which will need a x8 slot) or use all the motherboard SATA ports and buy an ASMedia 1061 or 1062-based dual port SATA controller for the extras.

I will keep that in mind if I have problems down the road. I chose the Marvel 88SE9215 because it was listed as working in the hardware compatibility wiki. The Asus ROG STRIX B450 mobo has 6 SATA 3 ports which are all in use, my system has 11 drives in total so I needed something that worked. The previous 2 controllers I had wouldn't even make it through building the array. One of which was an 8 port ASM1806 & ASM1061 combo that wouldn't even allow the system to boot.

 

I would like to thank everyone for their help/input! Things seem back to normal now, hopefully I don't have this issue again.

Link to comment
7 hours ago, Juise99 said:

I needed something that worked.

 

The trouble is, it doesn't work reliably. Marvell stuff worked well when Unraid used a 32-bit kernel - unfortunately, the compatibility guide is very out of date. The ASM1061 works fine out of the box controlling one or two disks. The ASM1806 is a PCI bridge that lets a card manufacturer hang mulltiple controllers across a single lane. Here's the original thread that brought the Marvell problem to light. Since then it has just got worse, with the workrounds failing too - if you do a search you'll find that many people have had problems.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.