unRAID freeze completely randomly

May 18, 20197 yr

Hello!

I've decided I wanted to try unRAID for a new NAS build, so I've installed it and had in running for a few months. But I'm experiencing severe issues where I can't consider this build complete.

I randomly get errors and it wasn't as bad in the beginning, but I never really hammered the NAS very much.

I get a whole bunch of different errors. Attached you see some of the ones I've screenshotted from my BNC. I have obtained a diagnostic dump after I've confirmed one error got on the screen but didn't make the unraid unresponsive yet. But it always happens.

I also suspect it can be related to docker, which I currently only run a rtorrent docker, I've attempted to heavily limit the number of files it allowed to keep open at once and limited it's ability to create connection, but it had no effect on unraid just crashing completely. I've kept my server in a good state longer if I don't start any dockers, and when I start rtorrent. The problems always start occurring within two days or so with uptime.

I've rand memtest86 on my hardware to check it, but it's found nothing.

tower-diagnostics-20190506-2015.zip

Edited May 18, 20197 yr by Rudde

Quote

May 18, 20197 yr

You should really post your diagnostics. And no, there's nothing you need to redact. About the only that can be discerned from your screenshots is that disk 3 requires you to Check Disk Filesystem on it.

Quote

May 18, 20197 yr

Author

I've added the diagnostic dump to my OP. I've rebooted it now, but my trail has expired among with my two extensions and I can't access the WebUI anymore. And it doesn't seem like /dev/md3 exist either. I suspect this is what unraid have named their pool, and that it's not a reference to one of my disks.

I have ran parity check several times where it completed without any issues.

Quote

May 18, 20197 yr

Community Expert

That diagnostic is nearly 2 weeks old.

Have you done a memtest recenty?

Quote

May 19, 20197 yr

Community Expert

5 hours ago, Rudde said:

And it doesn't seem like /dev/md3 exist either. I

The mdX type devices all relate to array disks with ‘X’ being the disk slot number. This means that md3 is equivalent to disk3.

Quote

May 19, 20197 yr

Author

7 hours ago, trurl said:

That diagnostic is nearly 2 weeks old.

Have you done a memtest recenty?

The memtest was done in the same time area. The server hasn't really been used since the diagnosis because of the problems descibed here being a prior for a long time prior to it.

3 hours ago, itimpi said:

The mdX type devices all relate to array disks with ‘X’ being the disk slot number. This means that md3 is equivalent to disk3.

Okey. Thanks. Do they count from Disk 1 -> md1, Disk 2 -> md2 and such? Where does cache and parity 1 and 2 land in this scheme?

Quote

May 19, 20197 yr

Community Expert

4 hours ago, Rudde said:

Do they count from Disk 1 -> md1, Disk 2 -> md2 and such? Where does cache and parity 1 and 2 land in this scheme?

disk1 is md1, etc. md is only disks mounted in the parity array, so cache isn't part of this. Parity is sometimes referred to as disk0. but it can't be md0 because it doesn't have a filesytem.

If you look at syslog you will see Unraid taking inventory of the disks as slot0, slot1, etc. Parity2 is slot29, after any possible data disks.

md is really about the disks as they are used with parity. When working with the md devices, parity is part of that. Writing to an md device updates parity, a disabled disk can still be accessed as an md device from the parity calculation, when repairing a filesytem you always use the md device so parity will be maintained.

For disks outside the array, sometimes the sd device will be referred to instead. Don't assume a specific sd device always refers to the same disk since that can change between boots, especially if you add or remove disks.

Quote

May 28, 20197 yr

Author

I've managed to run xfs_repair on the device in question. It have had absolutely on effect on any of the issues described in this thread. I have no idea what to do next. Is there anything indicating anything wrong in the diagnostics files?

Quote

May 29, 20197 yr

Community Expert

9 hours ago, Rudde said:

I've managed to run xfs_repair on the device in question.

Could you give us some details about that? The output from running the repair would be preferred.

Quote

May 29, 20197 yr

Author

8 hours ago, trurl said:

Could you give us some details about that? The output from running the repair would be preferred.

There was no errors in it, it was the standard output.

These issues also occurred way before that error popped up, also feel like this is the most irrelevant error to the symptoms experienced. Why would a completely corrupt storage-disk even crash unRAID?

Quote

May 29, 20197 yr

Community Expert

Can you get us a new diagnostic from after running the repair?

Quote

May 31, 20197 yr

Author

On 5/29/2019 at 5:36 PM, trurl said:

Can you get us a new diagnostic from after running the repair?

Yes. Here is a completely fresh one I took just now, after a fresh boot, after having not turned it on since the last crash.

No errors have yet occurred on monitor during this boot prior to taking the diagostics. I will let it run, and see if an error occur on screen, and take another one before it become completely useless and I have to hard-rest it.

tower-diagnostics-20190531-0916.zip

Quote

May 31, 20197 yr

Community Expert

Probably unrelated but your appdata, domains, and system shares have files on the array and have the wrong use cache setting.

Quote

June 2, 20197 yr

Author

Now it crashed again, but it is not responsive, it's on and display the error, but I can't connect to it with SSH or access the WebUI to get another diagnostics.

While this error is not one of the those I've have managed to screenshot, it is an error I've gotten before, it's "nf_nat_setup_info"

this happen after I started to stress-test my rTorrent docker, as I have suspected it's related to docker in some way. I started adding a few torrents, one after another and it very quickly crashed.

I tried to move things away from the SSD when I got this error message last time because I didn't want the SSD to be involved with the docker instances as I was afraid it had depleted sectors, so I moved the data to make sure that wasn't whats wrong.

Edited June 2, 20197 yr by Rudde

Quote

unRAID freeze completely randomly

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)