Jump to content

Disks Getting Red X


Mogo

Recommended Posts

This issue probably started a week ago when I added a new drive to the server.   There are 10 drives in the array.  A random disk will get a red x.  I do a rebuild on it and within 12 hours another disk gets a red x. 

 

Also not sure if it's related but when I stop the array the cpu load on the server was fluctuating from 60%-100%.  I don't know how the server can be doing stuff if the array is offline.

 

Any idea what's causing these issues?  Thanks in advance for any help.

Link to comment

since this started after adding a new drive are you sure that the PSU is up to the job of handling the extra drive?

 

if that is not the cause then you need to look at what else is shared by all drives experiencing problems. For instance are they on the same HBA.  If so what model is it as if it is Marvel based that can explain drives randomly having problems as marvel controllers are known to randomly drop drives for no obvious reason.

Link to comment
10 hours ago, jonathanm said:

PSU model? HBA? Hot swap bays? Heat control issues?

PSU = EVGA SuperNOVA 850 P2 220-P2-0850-X1

HBA = LSI 9305-24i

Hot swap bays = Norco RPC-4224 24 Bay, (1 parity and 10 drives in the array) so basically 11 drives are in hot swap cages

Heat control issues = I don't believe so, most of the drives are around 30°C only one drive is listed at 36°C (weird)

 

7 hours ago, itimpi said:

since this started after adding a new drive are you sure that the PSU is up to the job of handling the extra drive?

 

if that is not the cause then you need to look at what else is shared by all drives experiencing problems. For instance are they on the same HBA.  If so what model is it as if it is Marvel based that can explain drives randomly having problems as marvel controllers are known to randomly drop drives for no obvious reason.

 

I believe the psu should be good enough at 850 watts (unless I'm wrong).  All other specs you are interested in I just mentioned above and yes they all use the same HBA.  I've had that HBA card installed for maybe just over a year now (first time experiencing something like this with random red x drives).   One other thing that was new that I did recently was install the following docker containers for monitoring (apcupsd-influxdb-exporter, Grafana, HDDTemp, Influxdb, telegraf).  I disabled these containers yesterday.  I'm not sure if these containers continuously write data and perhaps were causing the drives to red x (long shot theory on my end, just trying to roll back any changes before all this instability started happening).

 

6 hours ago, johnnie.black said:

Please post the diagnostics: Tools -> Diagnostics, ideally after a disk gets disabled.

I started a rebuild for one of the drives after a few shutdowns so I assume the particular diagnostic you are looking for is no longer available.  If a drive red x's again (this is the 3rd time so far) and I believe one will based on my luck, I will get a report for you and post it.  Based on the current trends, probably within the next 24 hours or so I should have that available.

Link to comment
3 hours ago, Michael_P said:

Do you have all of the 4224's backplanes plugged into 1 cable coming off of the power supply?

Yes, one cable goes into the power supply and has 4 molex connectors on the other end.  The 4 molex connectors are plugged into the first 4 backplane's starting from the top.  Only the first 3 backplane's are populated.  The 4th does not have any drives attached to it.  The 5th & 6th backplanes are not connected to anything yet.

 

So as I mentioned before... was restoring a drive that hit the red x (3rd time different drive)  except this time after the parity finished and all was good except it says "Unmountable: No file system" but the drive has a green ball now.  When I scroll down to the array operation section, it gave me the option to format the drive saying all data will be lost.  Do I do this and then try to rebuild the drive again?  Also I noticed that my VM's tab is missing.

 

I powered off the machine and started it up again, same situation.  I noticed that for some reason the VM Manager in the settings menu had the Enable Vm's option set to No.  This is strange since I did not change it.  Anyways I changed this back to yes and I can see the VM's tab again.  For now I have the VM's all turned off until I can resolve these issues.

Edited by Mogo
Link to comment
2 hours ago, Mogo said:

"Unmountable: No file system" but the drive has a green ball now.  When I scroll down to the array operation section, it gave me the option to format the drive saying all data will be lost.  Do I do this and then try to rebuild the drive again?

NO!!!! If you format, all your data on any unmountable drive will be erased! If you format a drive, parity will be updated to reflect the format, and rebuilding will result in a freshly formatted drive with no files, just like you told it to do.

 

https://wiki.unraid.net/index.php/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

 

2 hours ago, Mogo said:

Yes, one cable goes into the power supply and has 4 molex connectors on the other end.  The 4 molex connectors are plugged into the first 4 backplane's starting from the top.  Only the first 3 backplane's are populated.  The 4th does not have any drives attached to it.  The 5th & 6th backplanes are not connected to anything yet.

So all your drives are being powered by just one string from the PSU? That may well be causing a voltage drop that would cause your symptoms. Ideally you would only use one connector from each cable so you have the most possible wires running from the PSU to the backplane.

Link to comment
9 minutes ago, jonathanm said:

So all your drives are being powered by just one string from the PSU? That may well be causing a voltage drop that would cause your symptoms. Ideally you would only use one connector from each cable so you have the most possible wires running from the PSU to the backplane.

This.

 

I was having problems with drives dropping from the array, replaced HBAs, cables, drives, until I figured out there were voltage drops causing random drives to fall out of the array.

Link to comment
16 minutes ago, jonathanm said:

NO!!!! If you format, all your data on any unmountable drive will be erased! If you format a drive, parity will be updated to reflect the format, and rebuilding will result in a freshly formatted drive with no files, just like you told it to do.

 

https://wiki.unraid.net/index.php/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

 

So all your drives are being powered by just one string from the PSU? That may well be causing a voltage drop that would cause your symptoms. Ideally you would only use one connector from each cable so you have the most possible wires running from the PSU to the backplane.

 

Damn.  Thanks for letting me know.  I'm going to shut down the server and see if I can run more power cables to the backplanes.  Then I'll look at those instructions you provided to fix the file system.  I'll report back the outcome when it's all done.

 

5 minutes ago, Michael_P said:

This.

 

I was having problems with drives dropping from the array, replaced HBAs, cables, drives, until I figured out there were voltage drops causing random drives to fall out of the array.

 

Thanks man, I wouldn't know anything about voltage drops and stuff like that.  I see 4 connectors I figure I can connect 4 things.

Link to comment
Just now, Mogo said:

 

Damn.  Thanks for letting me know.  I'm going to shut down the server and see if I can run more power cables to the backplanes.  Then I'll look at those instructions you provided to fix the file system.  I'll report back the outcome when it's all done.

 

 

Thanks man, I wouldn't know anything about voltage drops and stuff like that.  I see 4 connectors I figure I can connect 4 things.

I figured the same thing, "it's an 850W power supply, what could go wrong..."

Link to comment
On 12/17/2019 at 6:45 PM, jonathanm said:

NO!!!! If you format, all your data on any unmountable drive will be erased! If you format a drive, parity will be updated to reflect the format, and rebuilding will result in a freshly formatted drive with no files, just like you told it to do.

 

https://wiki.unraid.net/index.php/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

 

So all your drives are being powered by just one string from the PSU? That may well be causing a voltage drop that would cause your symptoms. Ideally you would only use one connector from each cable so you have the most possible wires running from the PSU to the backplane.

 

So I am trying to fix the drive now.  Found a spare cable to have it connected from the power supply to the 3rd backplane.  I was following the instructions in the link you provided however, I am not sure what to do next.  I'm attaching the output from the test.  I do not see anything about corruptions.  Can I assume everything is ok and move to the section After the test and repair?

file-system-check.txt

Link to comment

So I tried running the check again and it keeps coming back with the same output.  Figured I would try the xfs_repair command since that was the next step.  Anyways the output is below.  Not sure what to do.  How do I mount the filesystem and then unmount it?

 

 xfs_repair -v /dev/md7
Phase 1 - find and verify superblock...
        - block cache size set to 708168 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 2559054 tail block 2559047
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...