Total Unraid OS lockup 6.7.0


Recommended Posts

Ok guys, I need your wisdom. I'm having and issue and have no idea how to even start troubleshooting...

 

I had the server running just fine. Then the new 6.7.0 update rolled out. After installing the update, a PCI Sata expansion card was no longer being recognized, not even in the BIOS (this plugs into a PCI port and provides 4 Sata ports, 2 were in use by the parity drives). The Sata expansion card has its own 'BIOS' that now only allows to create a Raid array, something i don't need it for. So, I ordered a new motherboard that had the 10 Sata ports that i needed, installed that. and the server loaded up, shows all my drives running normally again.

 

This is where it got strange. The server would run normally for 24-36 hours. Then freeze. What i mean by freeze is, all docker apps are unresponsive, the web console is not prompting for a password nor is it loading at all. A forced shutdown and reboot resolves it. This time i boot into the GUI mode. Again it freezes 24-36 hours later, and upon checking the directly connected mouse/keyboard/monitor the web console is where i left it, the mouse cursor moves, the page scrolls in the firefox browser, but all the console buttons are non-responsive. Only resolution is hard reboot. So, I ordered a second new motherboard, swapped it out, same issue.

 

SO, please let me know where to start. I will warn you I'm tech savvy but have very little experience with Unraid or Linux so please, be specific, detailed and I will be glad to take and guidance you can offer. 


thermal take x9 case
asrock x370 taichi mobo (the second and third/current mobo)
32gb corsair vengeance LPX (4x8gb) DDR4 DRAM 3200MHz c16
EVGA supernova 650w G2, 80+ gold
Ryzen 7 1700
1x sandisk cruzer fit 32gb - Unraid OS drive
1x samsung SSD850 500gb cache drive
7x 8tb  plater drives for storage
2x 10tb plater drives for parity
APC 1500VA UPS
 

Link to comment
2 hours ago, Squid said:

Don't use a Ryzen, but my first plan of attack would be reviewing the Ryzen thread here, in particular the posts about C-States in the BIOS.  Not sure if AMD has fixed all of that yet or not.

Not seeing a Ryzen thread on the forum. (bunch of posts but nothing list you described). Point me in the right direction?

Link to comment

So, after reading the posts your pointed me to, i screwed everything up and thought it would be worth posting if, for nothing else, to let you laugh at my stupidity....

 

first things i saw to do, update my mobo, then disable c-state(the guy that tested the c-state stuff sounded exactly like my issue. 

 

So, first up, update mobo. ASRock states that to install newest version(5.5) you need to first install 5.1. Ok then to install 5.1, you need to first update the VGA driver. VGA driver install is only available as a windows exe installer. but running unraid, i have no windows environment. 

 

so then i worked to create a bootable windows installation on a USB drive. got that done, then windows wants to set up and lists a bunch of drives. This is were i destroyed my unraid server. You see, I saw a series of 4gb partitions. reading more about the USB boot-drive/installer i see it took my 64gb USB drive and split it into a series of 4gb partitions. and since you cant install windows 10 with only 4gb of space, i started deleting partitions. Then i see a few 7gb partitions, i did this for two of them but didn't think much of it. got through the windows install and booted. 

 

It was here that i realized the 7gb partitions were actually my 8tb hdds from my unraid array. i went ahead and updated all the mobo stuff that needed to be done. Then disabled c-state options in bios. 

 

booted into unraid. now unraid shows all drives and two if the array drives are not mountable and need to be formatted. (note: both parity drives were unaffected). I allowed unraid to format the two drives, and start a parity check. 

 

my parity checks take 15-24 hours usually, so we will see how much data can be rebuilt or lost. 

 

then i start to tackle that none of my dockers are showing.... 

 

long story short, I'm an idiot and should pay closer attention. But you have any advice or comments on next steps, feel free to post.

Link to comment

The information about updating the VGA drivers before updating the BIOS is to make sure you don't lock yourself out of Windows. Since Unraid isn't Windows you can and should ignore that information.

 

Unraid works well with Ryzen processors, especially the second generation ones. The problem with some early first generation ones is that when idle they can drop into a low power mode known as C-6 from which they can't wake up. There are a number of things you can try to fix this, the most useful one being this:

If that doesn't help I'd go for the zenstates option in the go file, and after that disabling C-states in the BIOS. The Power Supply Idle Control was only introduced comparatively recently but has always worked for me and I've built over 20 Zen-powered boxes (all running Linux, though not all running Unraid) since Ryzen was first introduced. I have a very early R7 1700 that was particularly problematic until this option was made available with a BIOS update. With 2000-series chips you don't need any mitigations.

 

You should not have let Unraid format your unmountable disks - the situation was possibly recoverable until you did that. None of your data will be rebuilt. The data on the mountable disks that you have not formatted should hopefully be intact. I hope you have backups of the other two disks.

 

Edited by John_M
Better wording
Link to comment
59 minutes ago, John_M said:

The information about updating the VGA drivers before updating the BIOS is to make sure you don't lock yourself out of Windows. Since Unraid isn't Windows you can and should ignore that information.

 

Unraid works well with Ryzen processors, especially the second generation ones. The problem with some early first generation ones is that when idle they can drop into a low power mode known as C-6 from which they can't wake up. There are a number of things you can try to fix this, the most useful one being this:

If that doesn't help I'd go for the zenstates option in the go file, and after that disabling C-states in the BIOS. The Power Supply Idle Control was only introduced comparatively recently but has always worked for me and I've built over 20 Zen-powered boxes (all running Linux, though not all running Unraid) since Ryzen was first introduced. I have a very early R7 1700 that was particularly problematic until this option was made available with a BIOS update. With 2000-series chips you don't need any mitigations.

 

You should not have let Unraid format your unmountable disks - the situation was possibly recoverable until you did that. None of your data will be rebuilt. The data on the mountable disks that you have not formatted should hopefully be intact. I hope you have backups of the other two disks.

 

So, I have a few questions...

 

First, for my future reference, how would i have recovered the data from the unmountable drives? (im not doubting you, i want to understand). 

 

Second, fortunately, if i can recover my dockers, they should queue up the lost data and pull it in. ill just need to wait the week or two for all the data to load. But im unsure if docker data was lost. for some reason i thought it was on the unraid USB drive but i think i remember that parts were in shares (that could have been lost). any advise here would be particularly helpful. 

  • Like 1
Link to comment

Since you have two parity disks you can afford to lose any two array disks (but no more) in your server and still recover the data. So if all the rest are OK you could have rebuilt the two corrupt ones. That's because they were corrupted outside of the Unraid environment. You formatted them inside the Unraid environment, which updates parity, which means you can now no longer recover them. (Not strictly true - a government agency could no doubt forensically recover your data, at a cost.)

 

I don't know anything about your configuration. The only thing that matters about Docker containers is the appdata since the containers themselves can be re-downloaded easily. It is not uncommon for the docker.img file to become corrupt and need to be rebuilt. It is very easy to delete it and recreate it from scratch. But the appdata is where the containers store their configuration, databases, etc. Most people store their appdata share on their cache disk. If you did too and it's still intact then you're in the clear.

 

The best two pieces of general advice I can give you are:

  • If in doubt, ask here before proceeding;
  • When Unraid formats an array disk it does exactly the same as when Windows formats a disk or a digital camera formats a flash card - it writes a new, empty file system to it and renders any data previously stored there inaccessible. That's what the format operation does and the presence of parity disks does not change that. TL;DR: If you format a disk, consider it's contents lost - but you already knew that, didn't you?
  • Like 1
Link to comment

The unfortunate thing is that you were misled by the BIOS upgrade procedure. ASRock doesn't support any other operating system than Windows on its consumer motherboards (even high end ones, like the Taichi) so the instructions assume the user is running Windows and it seems that updating the BIOS without updating the graphics card driver first can leave you with a blank screen. None of this is relevant to the Linux user and the correct upgrade procedure is actually very straightforward. The only part that was relevant was the two step process - upgrade to 5.1 first then reboot and upgrade to 5.5. They seem to need to do that when there's a major upgrade - in this case, support for the 3000 series of processors.

  • Like 1
Link to comment
2 hours ago, John_M said:

Since you have two parity disks you can afford to lose any two array disks (but no more) in your server and still recover the data. So if all the rest are OK you could have rebuilt the two corrupt ones. That's because they were corrupted outside of the Unraid environment. You formatted them inside the Unraid environment, which updates parity, which means you can now no longer recover them. (Not strictly true - a government agency could no doubt forensically recover your data, at a cost.)

 

I don't know anything about your configuration. The only thing that matters about Docker containers is the appdata since the containers themselves can be re-downloaded easily. It is not uncommon for the docker.img file to become corrupt and need to be rebuilt. It is very easy to delete it and recreate it from scratch. But the appdata is where the containers store their configuration, databases, etc. Most people store their appdata share on their cache disk. If you did too and it's still intact then you're in the clear.

 

The best two pieces of general advice I can give you are:

  • If in doubt, ask here before proceeding;
  • When Unraid formats an array disk it does exactly the same as when Windows formats a disk or a digital camera formats a flash card - it writes a new, empty file system to it and renders any data previously stored there inaccessible. That's what the format operation does and the presence of parity disks does not change that. TL;DR: If you format a disk, consider it's contents lost - but you already knew that, didn't you?

Well, Thank you  for all this information. What you are saying make total sense, not sure how i didn't see the situation this way prior. 

 

So, in the same vain of "when in doubt, ask here before proceeding" i have some thoughts on next steps...

 

As I mentioned my party check is still running but all data on the two, now formatted, drives is lost. Seems like it would be fine to start reinstalling dockers and hopefully recover the configs, start using them and get the server running my various apps. (i primarily use unraid for a VPN server and plex)

 

Seems safe enough of an action but here's my concern. 

 

The parity check started and the server is running and it didn't bate an eye when two drives where 'dead'. so I'm concerned that the parity check will result in some series of actions that end the end, i should wait for docker use... thoughts?

 

Thank you again

Link to comment

Is it a correcting parity check? Because two of your disks were written to outside of Unraid there are going to be some parity discrepancies, so the corrections need to be written to the parity disks and will be reported as errors. If it is a non-correcting parity check you will see errors but they won't be corrected so you'll need to run a second, correcting check. Parity checks are extremely disk I/O intensive. You can use the server and you can write to the array while a parity check is in progress but it will slow it down quite markedly. If your application writes to the cache disk, however, either because it is writing to a cache-only share or a cache-yes share, then this will have negligible effect on the speed of the parity check.

 

While the parity check is underway I would run a file system check on the cache disk. If there's no problem with it then your Docker containers might well be fine and not need any maintenance at all. When the parity check is complete you ought to run file system checks on each of your data disks. They should already appear as mounted and the two freshly formatted ones will, of course, be empty.

 

28 minutes ago, Aerodb said:

What you are saying make total sense, not sure how i didn't see the situation this way prior. 

It's much easier to think things through when calm and unpressured. When something goes wrong and you get that mounting feeling of panic it's very easy to get flustered and make mistakes.

Link to comment
1 hour ago, John_M said:

Is it a correcting parity check? Because two of your disks were written to outside of Unraid there are going to be some parity discrepancies, so the corrections need to be written to the parity disks and will be reported as errors. If it is a non-correcting parity check you will see errors but they won't be corrected so you'll need to run a second, correcting check. Parity checks are extremely disk I/O intensive. You can use the server and you can write to the array while a parity check is in progress but it will slow it down quite markedly. If your application writes to the cache disk, however, either because it is writing to a cache-only share or a cache-yes share, then this will have negligible effect on the speed of the parity check.

 

While the parity check is underway I would run a file system check on the cache disk. If there's no problem with it then your Docker containers might well be fine and not need any maintenance at all. When the parity check is complete you ought to run file system checks on each of your data disks. They should already appear as mounted and the two freshly formatted ones will, of course, be empty.

 

It's much easier to think things through when calm and unpressured. When something goes wrong and you get that mounting feeling of panic it's very easy to get flustered and make mistakes.

I'm not sure if it is a correcting parity check or not. I dont see any indication one way or the other. However, I do see the check progress showing 1845871 sync errors corrected so, probably yes on the correcting type?

 

Also when i select the cache disk i see an option for a file system check, but its states that i cant do so until the array is in maintenance mode. So i think i need to wait until the parity check is done to run that. 

 

Finally, I did re-enable dockers. and installed one docker(plex) and all the settings were default. so for now at least, i think my dockers will need to be set up fresh but perhaps the chase file check will help. It doesn't help that i forgot how to set up a docker so ill be relearning that. 

 

Either way this has all been a painful lesson but hopefully i wont make the same mistakes again. 

Link to comment

Finally, I did re-enable dockers. and installed one docker(plex) and all the settings were default. so for now at least, i think my dockers will need to be set up fresh but perhaps the chase file check will help. It doesn't help that i forgot how to set up a docker so ill be relearning that.

Did you use the Previous Apps option on the Apps tab to install Plex?     You would need to do it that way if you wanted the container settings to be re-installed as they were previously.

Link to comment
10 hours ago, Aerodb said:

I'm not sure if it is a correcting parity check or not. I dont see any indication one way or the other. However, I do see the check progress showing 1845871 sync errors corrected so, probably yes on the correcting type?

The default is a correcting check so unless you deliberately unchecked the box that's what it will be. I'm not sure if there's a clue in the GUI but you can tell from the entry in the syslog.

 

10 hours ago, Aerodb said:

Also when i select the cache disk i see an option for a file system check, but its states that i cant do so until the array is in maintenance mode. So i think i need to wait until the parity check is done to run that. 

Ah, ok. That's right. I was forgetting. I always start mine in maintenance mode if there's any sign of trouble.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.