Jump to content

Something went terribly wrong!


Recommended Posts

Hello everyone,

 

A couple days ago I posted in here because I had two disks fail in this topic:

Everything seemed to be working fine, but yesterday the disk that had had problems started showing errors again. So I bit the bullet and decided to order a new drive (thas currently underway) and just take the faulty disk out of the array. I have 2 parity drives, so it should be good.
Yesterday though, out of nowhere, some of my share became unaccesible, and in the GUI they were marked as not present. I restarted and they came back up, but after a while the same thing happened.
Today I did another restart and after a few hours I find the array in the following state:

imagen.thumb.png.e4e3ad2c4763ea51c528f5725593449f.png

 

 

Something is clearly wrong and I don't know what it might be. All my drives except the Seagate drives are connected through the "LSI SAS2116 PCI-Express Fusion-MPT SAS-2" card, could it be that that it's faulty?

I'm getting scared 😟. What steps should I take?

I attached both the system logs the diagnostics below.

 

 

Thank you very much for your help!

ivpiter-diagnostics-20220615-1316.zip ivpiter-syslog-20220615-1116.zip

Link to comment
3 hours ago, JorgeB said:

First things to do is to check/replace cables for disk2 and the 2TB pool disk, then post new diags after array start.


I replaced a whole MiniSAS cable that I had on my other computer and connected the 2TB drive with a sata cable directly to the motherboard.
I also took out the LSI card and reseated it just in case (it looked good though).

Now I have disk2 disabled and disk8 with "Unmontable". Disk4 was taken out of the array.

image.thumb.png.de16e2d802d49c70d354b97379ea625f.png

ivpiter-syslog-20220615-1449.zip

Link to comment

 

11 minutes ago, JorgeB said:

That and check filesystem on disks 2, 4 and 8.

 

I have attached diagnostics below. I'll start checking the filesystems, but disk4 is not currently installed in the array.
 

3 minutes ago, trurl said:

Do you have backups of anything important and irreplaceable? Even dual parity is not a substitute for backups.

 

Yes I do have another sever with the important data, but I also don't wanna lose all "unimportant" media.
 

ivpiter-diagnostics-20220615-1709.zip

Link to comment
31 minutes ago, JorgeB said:

Doesn't sound like the server is very stable, trying to repair/rebuild disks with an unstable server can cause more issues, see if you can get new diags after array start.

 

It has been working rock solid for 1 year +, it started behaving erratically two weeks ago. Nothing in the configuration changed. I tested the ram last time, have no idea how to further test it. Could it by any chance a problem with the usb drive? I remember having problems with them in the past.

I brought the server back up and started the array which looked fine for a second but then I got this message: Automatic Unraid no array operation will start. And the array seems like it's not started + it doesn't give me any option, just shutdown or reboot.

 

image.thumb.png.fc0790afcbcf0afb7fd5dd84e71ffe87.png

ivpiter-diagnostics-20220615-2020.zip

Link to comment
1 hour ago, jcarre said:

Automatic Unraid no array operation will start.

That's from the parity check tuning plugin.

 

6 minutes ago, jcarre said:

After yet another reboot, the array came back online but disk 2 is still disabled.

That's expected, it will remain disabled until it's rebuilt, same for disk4, you either need to replace the missing disk or remove it from the array.

Link to comment

Nothing assigned as disk3 which I assume is correct.

 

Disk2 disabled, disk4 disabled and missing, those emulated disks mounted and all other disks green and mounted.

 

Is that correct?

 

It looks like your lost+found share is on a lot more disks than just these we have been working on. Go to User Shares, click Compute... for lost+found, and post a screenshot.

 

Have you examined lost+found at all? It often is a lot of work to actually recover any files from it, and you may have a lot of data there.

 

Until you get your array stable, you might consider not rebuilding on top of any existing disk and only rebuilding onto spares so the original disks aren't changed and might be used to recover files that repairs and rebuilds don't

Link to comment
6 minutes ago, trurl said:

Nothing assigned as disk3 which I assume is correct.

 

Disk2 disabled, disk4 disabled and missing, those emulated disks mounted and all other disks green and mounted.

 

Is that correct?

 

 

Yes, disk3 was taken out of the array a long time ago (to shrink the array). Disk2 is disabled and disk4 is out of the array I still have it inside the case. I am waiting on a replacement for it. All the other drives are green.
 

Here is a screenshot of the lost+found share + ncdu command, it looks like 18+TB of data are on there

image.thumb.png.2cbf589dca0c6a994a817b5a89a167b5.png

image.thumb.png.75edb98d6b078a9a2b1dd23493a6cc89.png

 

Now I have a new drive on the way which should arrive on Monday + the one that I took out before (the Seagate).
What is the best course of action?

Link to comment

The reason lost+found is on all those disks is because you have done filesystem repair on all those disks, some of which are not part of the problems in this thread.

2 hours ago, jcarre said:

It has been working rock solid for 1 year +, it started behaving erratically two weeks ago.

Were all these repairs done in the last 2 weeks?

Link to comment
1 minute ago, trurl said:

The reason lost+found is on all those disks is because you have done filesystem repair on all those disks, some of which are not part of the problems in this thread.

Were all these repairs done in the last 2 weeks?

 

I don't remember doing any filesystem repair prior to the last thread that I started which was 2 weeks ago. On the previous one I did disk 8 and disk 4. Today I did 2, 4 and 8. So I have no idea why 6,7 and 10 are appearing here.

I had one disk fail once, which I replaced and rebuild from parity. And there was also a lot of changes due to replacing drives, but the parity operation always showed 0 errors. I also used to do monthly parity checks (I set them to once every 3 months now), and never had problem.

Link to comment
3 minutes ago, jcarre said:

I have no idea why 6,7 and 10 are appearing here

There are only two possible reasons, you repaired them, or you somehow moved lost+found from other disks to those disks. Unraid will not have moved them there by itself and mover only moves between cache and array never between array disks.

Link to comment
3 minutes ago, trurl said:

Does your controller have adequate cooling? Are you sure you don't have power problems? Any power splitters?

 

That could very well be a problem, since the card doesn't have a fan and it's been really hot in here lately. I use a Norco Rack Case with three fans at the front, but nothing on the back. Perhaps I could zip tie a one of those 40mm fans in the heat sink of the card.


In terms of power I have the server connected directly to a UPS which is connected directly into the wall. I am having a lot of power outages though since de AC sometimes triggers the breakers.

 

10 minutes ago, trurl said:

There are only two possible reasons, you repaired them, or you somehow moved lost+found from other disks to those disks. Unraid will not have moved them there by itself and mover only moves between cache and array never between array disks.

 

It could be possible that I did sometime ago, I can't remember. The sizes on these drives are a lot smaller though.

Link to comment
8 minutes ago, trurl said:

 

Oh yes, this was the very first day that I installed the new motherboard + cpu + ram combo, I think that because I had xmp enabled everything started to get corrupted. I changed that and all was good. It could be possible that I repaired the disks then.

 

10 minutes ago, trurl said:

What do you mean by that?

 

I meant the sizes of the "files" are a lot smaller, a lot less data in there.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...